[ClusterLabs] Q: warning: new_event_notification (4527-22416-14): Broken pipe (32)

Andrei Borzenkov arvidjaar at gmail.com
Fri Dec 18 06:17:10 EST 2020


18.12.2020 12:00, Ulrich Windl пишет:
> 
> Maybe a related question: Do STONITH resources have special rules, meaning they don't wait for successful fencing?

pacemaker resources in CIB do not perform fencing. They only register
fencing devices with fenced which does actual job. In particular ...

> I saw this between fencing being initiated and fencing being confirmed (h16 was DC, now h18 became DC):
> 
> Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Processing graph 0 (ref=pe_calc-dc-1608280169-21) derived from /var/lib/pacemaker/pengine/pe-warn-9.bz2
> Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Requesting fencing (reboot) of node h16
> Dec 18 09:29:29 h18 pacemaker-controld[4479]:  notice: Initiating start operation prm_stonith_sbd_start_0 locally on h18

... "start" operation on pacemaker stonith resource only registers this
device with fenced. It does *not* initiate stonith operation.

> ...
> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  error: Node h18 did not send start result (via controller) within 45000ms (action timeout plus cluster-delay)

I am not sure what happens here. Somehow fenced took very long time to
respond or something with communication between them.

> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  error: [Action   22]: In-flight resource op prm_stonith_sbd_start_0      on h18 (priority: 9900, waiting: (null))
> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  notice: Transition 0 aborted: Action lost
> Dec 18 09:31:14 h18 pacemaker-controld[4479]:  warning: rsc_op 22: prm_stonith_sbd_start_0 on h18 timed out
> ...
> Dec 18 09:31:15 h18 pacemaker-controld[4479]:  notice: Peer h16 was terminated (reboot) by h18 on behalf of pacemaker-controld.4527: OK
> Dec 18 09:31:17 h18 pacemaker-execd[4476]:  notice: prm_stonith_sbd start (call 164) exited with status 0 (execution time 110960ms, queue time 15001ms)

It could be related to pending fencing but I am not familiar with low
level details.

> ...
> Dec 18 09:31:30 h18 pacemaker-controld[4479]:  notice: Peer h16 was terminated (reboot) by h19 on behalf of pacemaker-controld.4479: OK
> Dec 18 09:31:30 h18 pacemaker-controld[4479]:  notice: Transition 0 (Complete=31, Pending=0, Fired=0, Skipped=1, Incomplete=3, Source=/var/lib/pacemaker/pengine/pe-warn-9.bz2): Stopped
> ...
> Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]:  warning: Unexpected result (error) was recorded for start of prm_stonith_sbd on h18 at Dec 18 09:31:14 2020
> Dec 18 09:31:30 h18 pacemaker-schedulerd[4478]:  notice:  * Recover    prm_stonith_sbd                      (             h18 )
> ...
> 
> Regards,
> Ulrich
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list