[ClusterLabs] pacemaker-remoted /dev/shm errors

Ken Gaillot kgaillot at redhat.com
Mon Mar 6 10:42:20 EST 2023


On Mon, 2023-03-06 at 16:03 +0300, Alexander Epaneshnikov via Users
wrote:
> Hello. we are using pacemaker 2.1.4-5.el8  and seeing strange errors
> in the
> logs when a request is made to the cluster.
> 
> Feb 17 08:18:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)      error: Error in connection setup
> (/dev/shm/qb-2984-1077673-18-7xR8Y0/qb): Remote I/O error (121)
> Feb 17 08:19:15 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)      error: Error in connection setup
> (/dev/shm/qb-2984-1077927-18-dX5NSt/qb): Remote I/O error (121)
> Feb 17 08:20:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)      error: Error in connection setup
> (/dev/shm/qb-2984-1078160-18-RjzD4K/qb): Remote I/O error (121)
> Feb 17 08:21:16 gm-srv-oshv-001.int.cld pacemaker-remoted   [2984]
> (handle_new_connection)      error: Error in connection setup
> (/dev/shm/qb-2984-1078400-18-YyJmJJ/qb): Remote I/O error (121)

The error code is likely coming from one of pacemaker-remoted's
qb_ipcs_service_handlers.

This is the correct behavior when a local client (typically a Pacemaker
command-line tool) attempts to contact the cluster before the cluster
has established a connection to the remote node.

I've also seen it very rarely in lab testing just before a new IPC
client is successfully accepted by pacemaker-remoted, and it doesn't
seem to have any ill effect, but I'm not sure why it shows up then.

I also occasionally see "Error in connection setup" on full cluster
nodes with "Operation not permitted" instead of "Remote I/O error". In
that case it's generally the correct behavior when a local client
attempts to connect while the cluster is shutting down on the node.

Pacemaker generally logs info- or warning-level messages for these, so
I'd rather the libqb message be at debug level, but I'm not sure
whether that would be a good idea for all possible errors.

> 
> other than that pacemaker/corosync works fine.
> 
> any suggestions on the cause of the error, or at least where to start
> debugging, are welcome.
> 
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list