[ClusterLabs] pacemaker-fenced /dev/shm errors

Ken Gaillot kgaillot at redhat.com
Mon Mar 27 09:48:13 EDT 2023


On Mon, 2023-03-27 at 14:48 +0800, d tbsky wrote:
> Hi:
>    the cluster is running under RHEL 9.0 elements. today I saw log
> report strange errors like below:
> 
> Mar 27 13:07:06.287 example.com pacemaker-fenced    [2405]
> (qb_sys_mmap_file_open)     error: couldn't allocate file
> /dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng-data:
> Interrupted system call (4)
> Mar 27 13:07:06.288 example.com pacemaker-fenced    [2405]
> (qb_rb_open_2)      error: couldn't create file for mmap
> Mar 27 13:07:06.288 example.com pacemaker-fenced    [2405]
> (qb_ipcs_shm_rb_open)       error:
> qb_rb_open:/dev/shm/qb-2405-2403-12-A9UUaJ/qb-request-stonith-ng:
> Interrupted system call (4)
> Mar 27 13:07:06.288 example.com pacemaker-fenced    [2405]
> (qb_ipcs_shm_connect)       error: shm connection FAILED: Interrupted
> system call (4)
> Mar 27 13:07:06.288 example.com pacemaker-fenced    [2405]
> (handle_new_connection)     error: Error in connection setup
> (/dev/shm/qb-2405-2403-12-A9UUaJ/qb): Interrupted system call (4)
> Mar 27 13:07:06.288 example.com pacemakerd          [2403]
> (pcmk__ipc_is_authentic_process_active)     info: Could not connect
> to
> stonith-ng IPC: Interrupted system call
> Mar 27 13:07:06.288 example.com pacemakerd          [2403]
> (check_active_before_startup_processes)     notice:
> pacemaker-fenced[2405] is unresponsive to ipc after 1 tries
> 
> there are no more "pacemaker-fenced" keywords in the log. the cluster
> seems fine and the process id "2405" of pacemaker-fenced is still
> running. may I assume the cluster is ok and I don't need to do
> anything since pacemaker didn't complain further?

I'm glad it's resolved, but for future reference, that does indicate a
serious problem. It means the fencer is not accepting any requests, so
any fencing attempts or even attempts to monitor a fencing device from
that node will fail.

If sbd is in use, it will kick in and reboot the node. However without
sbd, there is no automated mechanism to deal with the issue.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list