[ClusterLabs] Coming in Pacemaker 2.1.2: better display of internal failures

Ken Gaillot kgaillot at redhat.com
Tue Oct 19 13:16:35 EDT 2021


Hi all,

I hope to get the first release candidate for Pacemaker 2.1.2 out in a
couple of weeks.

One improvement will be in status displays (crm_mon, and the
crm_resource --force-* options) for failed actions.

OCF resource agents already have the ability to output an "exit reason"
for failures. These are displayed in the status, to give more detailed
information than just "error".

Now, Pacemaker will set exit reasons for internal failures as well.
This includes problems such as an agent or systemd unit not being
installed, timeouts in Pacemaker communication as opposed to the agent
itself, an agent process being killed by a signal, etc.

As an example, sending a kill -9 to a running agent monitor would
previously result in status with no explanation, requiring some log
diving to figure it out:

 * rsc1_monitor_60000 on node1 'error' (1): call=188, status='Error',
exitreason='', last-rc-change='Fri Sep 24 14:45:02 2021', queued=0ms,
exec=0ms

Now, the exit reason will plainly say what happened:

 * rsc1_monitor_60000 on node1 'error' (1): call=188, status='Error',
exitreason='Process interrupted by signal', last-rc-change='Fri Sep 24
14:45:02 2021', queued=0ms, exec=0ms

-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list