[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.1.2: better display of internal failures

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 20 03:35:56 EDT 2021


>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 19.10.2021 um 19:16 in
Nachricht
<edb5ca2fd29f3f514679c39a71513fa5a4b2f88d.camel at redhat.com>:
> Hi all,
> 
> I hope to get the first release candidate for Pacemaker 2.1.2 out in a
> couple of weeks.
> 
> One improvement will be in status displays (crm_mon, and the
> crm_resource ‑‑force‑* options) for failed actions.
> 
> OCF resource agents already have the ability to output an "exit reason"
> for failures. These are displayed in the status, to give more detailed
> information than just "error".
> 
> Now, Pacemaker will set exit reasons for internal failures as well.
> This includes problems such as an agent or systemd unit not being
> installed, timeouts in Pacemaker communication as opposed to the agent
> itself, an agent process being killed by a signal, etc.
> 
> As an example, sending a kill ‑9 to a running agent monitor would
> previously result in status with no explanation, requiring some log
> diving to figure it out:
> 
>  * rsc1_monitor_60000 on node1 'error' (1): call=188, status='Error',
> exitreason='', last‑rc‑change='Fri Sep 24 14:45:02 2021', queued=0ms,
> exec=0ms
> 
> Now, the exit reason will plainly say what happened:
> 
>  * rsc1_monitor_60000 on node1 'error' (1): call=188, status='Error',
> exitreason='Process interrupted by signal', last‑rc‑change='Fri Sep 24
> 14:45:02 2021', queued=0ms, exec=0ms

Oops: When you detected that a process was terminated by a signal you would
also know _which_ signal; why not log it then?
And: Do you also detect and log when a core-dump was created?

That would just sound logical to me.

Regards,
Ulrich




More information about the Users mailing list