[ClusterLabs] Why not retry a monitor (pacemaker-execd) that got a segmentation fault?
Ulrich.Windl at rz.uni-regensburg.de
Tue Jun 14 08:36:47 EDT 2022
I had a case where a VirtualDomain monitor operation ended in a core dump (actually it was pacemaker-execd, but it counted as "monitor" operation), and the cluster decided to restart the VM. Wouldn't it be worth to retry the monitor operation first?
Chances are that a re-tried monitor operation returns a better status than segmentation fault.
Or dies the logic just ignore processes dying on signals?
20201202.ba59be712-150300.4.21.1.x86_64 (SLES15 SP3)
Jun 14 14:09:16 h19 systemd-coredump: Process 28786 (pacemaker-execd) of user 0 dumped core.
Jun 14 14:09:16 h19 pacemaker-execd: warning: prm_xen_v04_monitor_600000 terminated with signal: Segmentation fault
Jun 14 14:09:16 h19 pacemaker-controld: error: Result of monitor operation for prm_xen_v04 on h19: Error
Jun 14 14:09:16 h19 pacemaker-controld: notice: Transition 9 action 107 (prm_xen_v04_monitor_600000 on h19): expected 'ok' but got 'error'
Jun 14 14:09:16 h19 pacemaker-schedulerd: notice: * Recover prm_xen_v04 ( h19 )
More information about the Users