<div dir="ltr">Thanks for your reply Digimer.<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 13, 2017 at 1:35 PM, Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 13/03/17 12:07 PM, Chris Walker wrote:<br>
> Hello,<br>
><br>
> On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync:<br>
> 2.4.0-4; libqb: 1.0-1),<br>
> it looks like successful STONITH operations are not communicated from<br>
> stonith-ng back to theinitiator (in this case, crmd) until the STONITHed<br>
> node is removed from the cluster when<br>
> Corosync notices that it's gone (i.e., after the token timeout).<br>
<br>
</span>Others might have more useful info, but my understanding of a lost node<br>
sequence is this;<br>
<br>
1. Node stops responding, corosync declares it lost after token timeout<br>
2. Corosync reforms the cluster with remaining node(s), checks if it is<br>
quorate (always true in 2-node)<br>
3. Corosync informs Pacemaker of the membership change.<br>
4. Pacemaker invokes stonith, waits for the fence agent to return<br>
"success" (exit code of the agent as per the FenceAgentAPI<br>
[<a href="https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md%5D" rel="noreferrer" target="_blank">https://docs.pagure.org/Clust<wbr>erLabs.fence-agents/FenceAgent<wbr>API.md]</a>). If<br>
the method fails, it moves on to the next method. If all methods fail,<br>
it goes back to the first method and tries again, looping indefinitely.<br>
<div><div class="gmail-m_-2709524083496572569h5"><br></div></div></blockquote><div><br></div><div>That's roughly my understanding as well for the case when a node suddenly leaves the cluster (e.g., poweroff), and this case is working as expected for me. I'm seeing delays when a node is marked for STONITH while it's still up (e.g., after a stop operation fails). In this case, what I expect to see is something like:</div><div>1. crmd requests that stonith-ng fence the node</div><div>2. stonith-ng (might be a different stonith-ng) fences the node and sends a message that it has succeeded</div><div>3. stonith-ng (the original from step 1) receives this message and communicates back to crmd that the node has been fenced</div><div><br></div><div>but what I'm seeing is</div><div>1. crmd requests that stonith-ng fence the node</div><div>2. stonith-ng fences the node and sends a message saying that it has succeeded</div><div>3. nobody hears this message</div><div>4. Corosync eventually realizes that the fenced node is no longer part of the config and broadcasts a config change</div><div>5. stonith-ng finishes the STONITH operation that was started earlier and communicates back to crmd that the node has been STONITHed</div><div><br></div><div>I'm less convinced that the sending of the STONITH notify in step 2 is at fault; it also seems possible that a callback is not being run when it should be.</div><div><br></div><div><br></div><div>The Pacemaker configuration is not important (I've seen this happen on our production clusters and on a small sandbox), but the config is:</div><div><br></div><div><div>primitive bug0-stonith stonith:fence_ipmilan \</div><div> params pcmk_host_list=bug0 ipaddr=bug0-ipmi action=off login=admin passwd=admin \</div><div> meta target-role=Started</div><div>primitive bug1-stonith stonith:fence_ipmilan \</div><div> params pcmk_host_list=bug1 ipaddr=bug1-ipmi action=off login=admin passwd=admin \</div><div> meta target-role=Started</div><div>primitive prm-snmp-heartbeat snmptrap_heartbeat \</div><div> params snmphost=bug0 community=public \</div><div> op monitor interval=10 timeout=300 \</div><div> op start timeout=300 interval=0 \</div><div> op stop timeout=300 interval=0</div><div>clone cln-snmp-heartbeat prm-snmp-heartbeat \</div><div> meta interleave=true globally-unique=false ordered=false notify=false</div><div>location bug0-stonith-loc bug0-stonith -inf: bug0</div><div>location bug1-stonith-loc bug1-stonith -inf: bug1</div></div><div><br></div><div>The corosync config might be more interesting:</div><div><br></div><div><div>totem {</div><div> version: 2</div><div> crypto_cipher: none</div><div> crypto_hash: none</div><div> secauth: off</div><div> rrp_mode: passive</div><div> transport: udpu</div><div> token: 240000</div><div> consensus: 1000</div><div><br></div><div> interface {</div><div> ringnumber 0</div><div> bindnetaddr: 203.0.113.0</div><div> mcastport: 5405</div><div> ttl: 1</div><div> }</div><div>}</div></div><div><div>nodelist {</div><div> node {<br></div><div> ring0_addr: 203.0.113.1</div><div> nodeid: 1</div><div> }</div><div> node {</div><div> ring0_addr: 203.0.113.2</div><div> nodeid: 2</div><div> }</div><div>}</div></div><div><div>quorum {</div><div> provider: corosync_votequorum</div><div> two_node: 1</div><div>}</div></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div class="gmail-m_-2709524083496572569h5">
> In trace debug logs, I see the STONITH reply sent via the<br>
> cpg_mcast_joined (libqb) function in crm_cs_flush<br>
> (stonith_send_async_reply->sen<wbr>d_cluster_text->send_cluster_<wbr>text->send_cpg_iov->crm_cs_<wbr>flush->cpg_mcast_joined)<br>
><br>
> Mar 13 07:18:22 [6466] bug0 stonith-ng: ( commands.c:1891 ) trace:<br>
> stonith_send_async_reply: Reply <st-reply st_origin="bug1"<br>
> t="stonith-ng" st_op="st_fence" st_device_id="ustonith:0"<br>
> st_remote_op="39b1f1e0-b76f-4d<wbr>25-bd15-77b956c914a0"<br>
> st_clientid="823e92da-8470-491<wbr>a-b662-215526cced22"<br>
> st_clientname="crmd.3973" st_target="bug1" st_device_action="st_fence"<br>
> st_callid="3" st_callopt="0" st_rc="0" st_output="Chassis Power Control:<br>
> Reset\nChassis Power Control: Down/Off\nChassis Power Control: Down/Off\nC<br>
> Mar 13 07:18:22 [6466] bug0 stonith-ng: ( cpg.c:636 ) trace:<br>
> send_cluster_text: Queueing CPG message 9 to all (1041 bytes, 449<br>
> bytes payload): <st-reply st_origin="bug1" t="stonith-ng"<br>
> st_op="st_notify" st_device_id="ustonith:0"<br>
> st_remote_op="39b1f1e0-b76f-4d<wbr>25-bd15-77b956c914a0"<br>
> st_clientid="823e92da-8470-491<wbr>a-b662-215526cced22" st_clientna<br>
> Mar 13 07:18:22 [6466] bug0 stonith-ng: ( cpg.c:207 ) trace:<br>
> send_cpg_iov: Queueing CPG message 9 (1041 bytes)<br>
> Mar 13 07:18:22 [6466] bug0 stonith-ng: ( cpg.c:170 ) trace:<br>
> crm_cs_flush: CPG message sent, size=1041<br>
> Mar 13 07:18:22 [6466] bug0 stonith-ng: ( cpg.c:185 ) trace:<br>
> crm_cs_flush: Sent 1 CPG messages (0 remaining, last=9): OK (1)<br>
><br>
> But I see no further action from stonith-ng until minutes later;<br>
> specifically, I don't see remote_op_done run, so the dead node is still<br>
> an 'online (unclean)' member of the array and no failover can take place.<br>
><br>
> When the token expires (yes, we use a very long token), I see the following:<br>
><br>
> Mar 13 07:22:11 [6466] bug0 stonith-ng: (membership.c:1018 ) notice:<br>
> crm_update_peer_state_iter: Node bug1 state is now lost | nodeid=2<br>
> previous=member source=crm_update_peer_proc<br>
> Mar 13 07:22:11 [6466] bug0 stonith-ng: ( main.c:1275 ) debug:<br>
> st_peer_update_callback: Broadcasting our uname because of node 2<br>
> Mar 13 07:22:11 [6466] bug0 stonith-ng: ( cpg.c:636 ) trace:<br>
> send_cluster_text: Queueing CPG message 10 to all (666 bytes, 74<br>
> bytes payload): <stonith_command __name__="stonith_command"<br>
> t="stonith-ng" st_op="poke"/><br>
> ...<br>
> Mar 13 07:22:11 [6466] bug0 stonith-ng: ( commands.c:2582 ) debug:<br>
> stonith_command: Processing st_notify reply 0 from bug0 ( 0)<br>
> Mar 13 07:22:11 [6466] bug0 stonith-ng: ( remote.c:1945 ) debug:<br>
> process_remote_stonith_exec: Marking call to poweroff for bug1 on<br>
> behalf of crmd.3973@39b1f1e0-b76f-4d25-b<wbr>d15-77b956c914a0.bug1: OK (0)<br>
><br>
> and the STONITH command is finally communicated back to crmd as complete<br>
> and failover commences.<br>
><br>
> Is this delay a feature of the cpg_mcast_joined function? If I<br>
> understand correctly (unlikely), it looks like cpg_mcast_joined is not<br>
> completing because one of the nodes in the group is missing, but I<br>
> haven't looked at that code closely yet. Is it advisable to have<br>
> stonith-ng broadcast a membership change when it successfully fences a node?<br>
><br>
> Attaching logs with PCMK_debug=stonith-ng<br>
> and PCMK_trace_functions=stonith_s<wbr>end_async_reply,send_cluster_t<wbr>ext,send_cpg_iov,crm_cs_flush<br>
><br>
> Thanks in advance,<br>
> Chris<br>
<br>
</div></div>Can you share your full pacemaker config (please obfuscate passwords).<br>
<br>
--<br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.com/w/" rel="noreferrer" target="_blank">https://alteeve.com/w/</a><br>
"I am, somehow, less interested in the weight and convolutions of<br>
Einstein’s brain than in the near certainty that people of equal talent<br>
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould<br>
<br>
______________________________<wbr>_________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>
<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/m<wbr>ailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc<wbr>/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br></div></div>