[ClusterLabs] Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Dec 18 04:00:34 EST 2020


Hi!

I wonder what "warning: new_event_notification (4527-22416-14): Broken pipe (32)" means: A bug? (SLES15 SP2, BTW)
It happened after a "crm resource refresh":

Dec 18 09:25:51 h16 pacemaker-controld[4527]:  notice: Forcing the status of all resources to be redetected
Dec 18 09:25:51 h16 pacemaker-attrd[4525]:  notice: Setting last-failure-prm_xen_test-jeos#monitor_600000[h18]: 1608279287 -> (unset)
Dec 18 09:25:51 h16 pacemaker-attrd[4525]:  notice: Setting fail-count-prm_xen_test-jeos#monitor_600000[h18]: 1 -> (unset)
Dec 18 09:25:51 h16 pacemaker-controld[4527]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Dec 18 09:25:51 h16 pacemaker-controld[4527]:  warning: new_event_notification (4527-22416-14): Broken pipe (32)
...

The reprobe/refresh seemed to be successful still, but I wonder.

Maybe a related question: Do STONITH resources have special rules, meaning they don't wait for successful fencing?
I saw this between fencing being initiated and fencing being confirmed (h16 was DC, now h18 became DC):

Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Processing graph 0 (ref=pe_calc-dc-1608280169-21) derived from /var/lib/pacemaker/pengine/pe-warn-9.bz2
Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Requesting fencing (reboot) of node h16
Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Initiating start operation prm_stonith_sbd_start_0 locally on h18
...
Dec 18 09:31:14 h18 pacemaker-controld[4479]:  error: Node h18 did not send start result (via controller) within 45000ms (action timeout plus cluster-delay)
Dec 18 09:31:14 h18 pacemaker-controld[4479]:  error: [Action   22]: In-flight resource op prm_stonith_sbd_start_0      on h18 (priority: 9900, waiting: (null))
Dec 18 09:31:14 h18 pacemaker-controld[4479]:  notice: Transition 0 aborted: Action lost
Dec 18 09:31:14 h18 pacemaker-controld[4479]:  warning: rsc_op 22: prm_stonith_sbd_start_0 on h18 timed out
...
Dec 18 09:31:15 h18 pacemaker-controld[4479]:  notice: Peer h16 was terminated (reboot) by h18 on behalf of pacemaker-controld.4527: OK
Dec 18 09:31:17 h18 pacemaker-execd[4476]:  notice: prm_stonith_sbd start (call 164) exited with status 0 (execution time 110960ms, queue time 15001ms)
...
Dec 18 09:31:30 h18 pacemaker-controld[4479]:  notice: Peer h16 was terminated (reboot) by h19 on behalf of pacemaker-controld.4479: OK
Dec 18 09:31:30 h18 pacemaker-controld[4479]:  notice: Transition 0 (Complete=31, Pending=0, Fired=0, Skipped=1, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-warn-9.bz2): Stopped
...
Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]:  warning: Unexpected result (error) was recorded for start of prm_stonith_sbd on h18 at Dec 18 09:31:14 2020
Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]:  notice:  * Recover    prm_stonith_sbd                      (             h18 )
...

Regards,
Ulrich





More information about the Users mailing list