<div dir="ltr">Hi!<div><br></div><div>I have a 2-node cluster with shared storage and SBD-fencing.</div><div>One node was down for maintenance.</div><div>Due to external reasons, second node was rebotted. After reboot service never got up:</div>
<div><br></div><div><div>Oct 29 13:04:21 wcs2 pengine[2362]: warning: stage6: Scheduling Node wcs1 for STONITH</div><div>Oct 29 13:04:21 wcs2 crmd[2363]: notice: te_fence_node: Executing reboot fencing operation (53) on wcs1 (timeout=60000)<br>
Oct 29 13:05:33 wcs2 stonith-ng[2359]: error: remote_op_done: Operation reboot of wcs1 by wcs2 for crmd.2363@wcs2.4a3b045d: Timer expired<br></div><div><div>Oct 29 13:05:33 wcs2 crmd[2363]: notice: tengine_stonith_callback: Stonith operation 2/53:0:0:f56c4538-1ad8-4871-825e-167eb9304677: Timer expired (-62)</div>
<div>Oct 29 13:05:33 wcs2 crmd[2363]: notice: tengine_stonith_callback: Stonith operation 2 for wcs1 failed (Timer expired): aborting transition.</div><div>Oct 29 13:05:33 wcs2 crmd[2363]: notice: tengine_stonith_notify: Peer wcs1 was not terminated (st_notify_fence) by wcs2 for wcs2: Timer expired (ref=4a3b045d-cc08-4e2f-8279-a85d113781b2) by client crmd.2363</div>
<div>Oct 29 13:05:33 wcs2 crmd[2363]: notice: run_graph: Transition 0 (Complete=20, Pending=0, Fired=0, Skipped=29, Incomplete=0, Source=/usr/var/lib/pacemaker/pengine/pe-warn-54.bz2): Stopped</div><div>Oct 29 13:05:33 wcs2 pengine[2362]: notice: unpack_config: On loss of CCM Quorum: Ignore</div>
<div>Oct 29 13:05:33 wcs2 pengine[2362]: warning: stage6: Scheduling Node wcs1 for STONITH</div></div><div><br></div><div>And this runs forever in cycle...</div><div><br></div><div>The node wcs1 is off, should not SBD determine that, and should not the cluster start the resources?</div>
<div><br></div><div>Best regards,</div><div>Alexandr A. Alexandrov</div><div><br></div>-- <br>С уважением, ААА.
</div></div>