[ClusterLabs] Antw: [EXT] Re: Why not retry a monitor (pacemaker‑execd) that got a segmentation fault?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Jun 15 02:32:26 EDT 2022


>>> Ulrich Windl schrieb am 14.06.2022 um 15:53 in Nachricht <62A892F0.174 : 161 :
60728>:

...
> Yes it's odd, but isn't the cluster just to protect us from odd situations? 
> ;-)

I have more odd stuff:
Jun 14 20:40:09 rksaph18 pacemaker-execd[7020]:  warning: prm_lockspace_ocfs2_monitor_120000 process (PID 30234) timed out
...
Jun 14 20:40:14 h18 pacemaker-execd[7020]:  crit: prm_lockspace_ocfs2_monitor_120000 process (PID 30234) will not die!
...
Jun 14 20:40:53 h18 pacemaker-controld[7026]:  warning: lrmd IPC request 525 failed: Connection timed out after 5000ms
Jun 14 20:40:53 h18 pacemaker-controld[7026]:  error: Couldn't perform lrmd_rsc_cancel operation (timeout=0): -110: Connection timed out (110)
...
Jun 14 20:42:23 h18 pacemaker-controld[7026]:  error: Couldn't perform lrmd_rsc_exec operation (timeout=90000): -114: Connection timed out (110)
Jun 14 20:42:23 h18 pacemaker-controld[7026]:  error: Operation stop on prm_lockspace_ocfs2 failed: -70
...
Jun 14 20:42:23 h18 pacemaker-controld[7026]:  warning: Input I_FAIL received in state S_NOT_DC from do_lrm_rsc_op
Jun 14 20:42:23 h18 pacemaker-controld[7026]:  notice: State transition S_NOT_DC -> S_RECOVERY
Jun 14 20:42:23 h18 pacemaker-controld[7026]:  warning: Fast-tracking shutdown in response to errors
Jun 14 20:42:23 h18 pacemaker-controld[7026]:  error: Input I_TERMINATE received in state S_RECOVERY from do_recover
Jun 14 20:42:28 h18 pacemaker-controld[7026]:  warning: Sending IPC to lrmd disabled until pending reply received
Jun 14 20:42:28 h18 pacemaker-controld[7026]:  error: Couldn't perform lrmd_rsc_cancel operation (timeout=0): -114: Connection timed out (110)
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  warning: Sending IPC to lrmd disabled until pending reply received
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  error: Couldn't perform lrmd_rsc_cancel operation (timeout=0): -114: Connection timed out (110)
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Stopped 2 recurring operations at shutdown (0 remaining)
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  error: 3 resources were active at shutdown
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Disconnected from the executor
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Disconnected from Corosync
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  notice: Disconnected from the CIB manager
Jun 14 20:42:33 h18 pacemaker-controld[7026]:  error: Could not recover from internal error
Jun 14 20:42:33 h18 pacemakerd[7003]:  error: pacemaker-controld[7026] exited with status 1 (Error occurred)
Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker-schedulerd
Jun 14 20:42:33 h18 pacemaker-schedulerd[7024]:  notice: Caught 'Terminated' signal
Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker-attrd
Jun 14 20:42:33 h18 pacemaker-attrd[7022]:  notice: Caught 'Terminated' signal
Jun 14 20:42:33 h18 pacemakerd[7003]:  notice: Stopping pacemaker-execd
Jun 14 20:42:34 h18 sbd[6856]:  warning: inquisitor_child: pcmk health check: UNHEALTHY
Jun 14 20:42:34 h18 sbd[6856]:  warning: inquisitor_child: Servant pcmk is outdated (age: 41877)
(SBD Fencing)

Regards,
Ulrich





More information about the Users mailing list