[ClusterLabs] Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Dec 18 04:00:34 EST 2020
Hi!
I wonder what "warning: new_event_notification (4527-22416-14): Broken pipe (32)" means: A bug? (SLES15 SP2, BTW)
It happened after a "crm resource refresh":
Dec 18 09:25:51 h16 pacemaker-controld[4527]: notice: Forcing the status of all resources to be redetected
Dec 18 09:25:51 h16 pacemaker-attrd[4525]: notice: Setting last-failure-prm_xen_test-jeos#monitor_600000[h18]: 1608279287 -> (unset)
Dec 18 09:25:51 h16 pacemaker-attrd[4525]: notice: Setting fail-count-prm_xen_test-jeos#monitor_600000[h18]: 1 -> (unset)
Dec 18 09:25:51 h16 pacemaker-controld[4527]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Dec 18 09:25:51 h16 pacemaker-controld[4527]: warning: new_event_notification (4527-22416-14): Broken pipe (32)
...
The reprobe/refresh seemed to be successful still, but I wonder.
Maybe a related question: Do STONITH resources have special rules, meaning they don't wait for successful fencing?
I saw this between fencing being initiated and fencing being confirmed (h16 was DC, now h18 became DC):
Dec 18 09:29:29 h18 pacemaker-controld[4479]: notice: Processing graph 0 (ref=pe_calc-dc-1608280169-21) derived from /var/lib/pacemaker/pengine/pe-warn-9.bz2
Dec 18 09:29:29 h18 pacemaker-controld[4479]: notice: Requesting fencing (reboot) of node h16
Dec 18 09:29:29 h18 pacemaker-controld[4479]: notice: Initiating start operation prm_stonith_sbd_start_0 locally on h18
...
Dec 18 09:31:14 h18 pacemaker-controld[4479]: error: Node h18 did not send start result (via controller) within 45000ms (action timeout plus cluster-delay)
Dec 18 09:31:14 h18 pacemaker-controld[4479]: error: [Action 22]: In-flight resource op prm_stonith_sbd_start_0 on h18 (priority: 9900, waiting: (null))
Dec 18 09:31:14 h18 pacemaker-controld[4479]: notice: Transition 0 aborted: Action lost
Dec 18 09:31:14 h18 pacemaker-controld[4479]: warning: rsc_op 22: prm_stonith_sbd_start_0 on h18 timed out
...
Dec 18 09:31:15 h18 pacemaker-controld[4479]: notice: Peer h16 was terminated (reboot) by h18 on behalf of pacemaker-controld.4527: OK
Dec 18 09:31:17 h18 pacemaker-execd[4476]: notice: prm_stonith_sbd start (call 164) exited with status 0 (execution time 110960ms, queue time 15001ms)
...
Dec 18 09:31:30 h18 pacemaker-controld[4479]: notice: Peer h16 was terminated (reboot) by h19 on behalf of pacemaker-controld.4479: OK
Dec 18 09:31:30 h18 pacemaker-controld[4479]: notice: Transition 0 (Complete=31, Pending=0, Fired=0, Skipped=1, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-warn-9.bz2): Stopped
...
Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]: warning: Unexpected result (error) was recorded for start of prm_stonith_sbd on h18 at Dec 18 09:31:14 2020
Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]: notice: * Recover prm_stonith_sbd ( h18 )
...
Regards,
Ulrich
More information about the Users
mailing list