<div dir="ltr">Thanks for your reply Digimer.<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 13, 2017 at 1:35 PM, Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 13/03/17 12:07 PM, Chris Walker wrote:<br>

> Hello,<br>

><br>

> On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync:<br>

> 2.4.0-4; libqb: 1.0-1),<br>

> it looks like successful STONITH operations are not communicated from<br>

> stonith-ng back to theinitiator (in this case, crmd) until the STONITHed<br>

> node is removed from the cluster when<br>

> Corosync notices that it's gone (i.e., after the token timeout).<br>

<br>

</span>Others might have more useful info, but my understanding of a lost node<br>

sequence is this;<br>

<br>

1. Node stops responding, corosync declares it lost after token timeout<br>

2. Corosync reforms the cluster with remaining node(s), checks if it is<br>

quorate (always true in 2-node)<br>

3. Corosync informs Pacemaker of the membership change.<br>

4. Pacemaker invokes stonith, waits for the fence agent to return<br>

"success" (exit code of the agent as per the FenceAgentAPI<br>

[<a href="https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md%5D" rel="noreferrer" target="_blank">https://docs.pagure.org/Clust<wbr>erLabs.fence-agents/FenceAgent<wbr>API.md]</a>). If<br>

the method fails, it moves on to the next method. If all methods fail,<br>

it goes back to the first method and tries again, looping indefinitely.<br>

<div><div class="gmail-m_-2709524083496572569h5"><br></div></div></blockquote><div><br></div><div>That's roughly my understanding as well for the case when a node suddenly leaves the cluster (e.g., poweroff), and this case is working as expected for me.  I'm seeing delays when a node is marked for STONITH while it's still up (e.g., after a stop operation fails).  In this case, what I expect to see is something like:</div><div>1.  crmd requests that stonith-ng fence the node</div><div>2.  stonith-ng (might be a different stonith-ng) fences the node and sends a message that it has succeeded</div><div>3.  stonith-ng (the original from step 1) receives this message and communicates back to crmd that the node has been fenced</div><div><br></div><div>but what I'm seeing is</div><div>1.  crmd requests that stonith-ng fence the node</div><div>2.  stonith-ng fences the node and sends a message saying that it has succeeded</div><div>3.  nobody hears this message</div><div>4.  Corosync eventually realizes that the fenced node is no longer part of the config and broadcasts a config change</div><div>5.  stonith-ng finishes the STONITH operation that was started earlier and communicates back to crmd that the node has been STONITHed</div><div><br></div><div>I'm less convinced that the sending of the STONITH notify in step 2 is at fault; it also seems possible that a callback is not being run when it should be.</div><div><br></div><div><br></div><div>The Pacemaker configuration is not important (I've seen this happen on our production clusters and on a small sandbox), but the config is:</div><div><br></div><div><div>primitive bug0-stonith stonith:fence_ipmilan \</div><div>        params pcmk_host_list=bug0 ipaddr=bug0-ipmi action=off login=admin passwd=admin \</div><div>        meta target-role=Started</div><div>primitive bug1-stonith stonith:fence_ipmilan \</div><div>        params pcmk_host_list=bug1 ipaddr=bug1-ipmi action=off login=admin passwd=admin \</div><div>        meta target-role=Started</div><div>primitive prm-snmp-heartbeat snmptrap_heartbeat \</div><div>        params snmphost=bug0 community=public \</div><div>        op monitor interval=10 timeout=300 \</div><div>        op start timeout=300 interval=0 \</div><div>        op stop timeout=300 interval=0</div><div>clone cln-snmp-heartbeat prm-snmp-heartbeat \</div><div>        meta interleave=true globally-unique=false ordered=false notify=false</div><div>location bug0-stonith-loc bug0-stonith -inf: bug0</div><div>location bug1-stonith-loc bug1-stonith -inf: bug1</div></div><div><br></div><div>The corosync config might be more interesting:</div><div><br></div><div><div>totem {</div><div>    version: 2</div><div>    crypto_cipher: none</div><div>    crypto_hash: none</div><div>    secauth: off</div><div>    rrp_mode: passive</div><div>    transport: udpu</div><div>    token: 240000</div><div>    consensus: 1000</div><div><br></div><div>    interface {</div><div>        ringnumber 0</div><div>        bindnetaddr: 203.0.113.0</div><div>        mcastport: 5405</div><div>        ttl: 1</div><div>    }</div><div>}</div></div><div><div>nodelist {</div><div>        node {<br></div><div>                ring0_addr: 203.0.113.1</div><div>                nodeid: 1</div><div>        }</div><div>        node {</div><div>                ring0_addr: 203.0.113.2</div><div>                nodeid: 2</div><div>        }</div><div>}</div></div><div><div>quorum {</div><div>    provider: corosync_votequorum</div><div>    two_node: 1</div><div>}</div></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div class="gmail-m_-2709524083496572569h5">

> In trace debug logs, I see the STONITH reply sent via the<br>

> cpg_mcast_joined (libqb) function in crm_cs_flush<br>

> (stonith_send_async_reply->sen<wbr>d_cluster_text->send_cluster_<wbr>text->send_cpg_iov->crm_cs_<wbr>flush->cpg_mcast_joined)<br>

><br>

> Mar 13 07:18:22 [6466] bug0 stonith-ng: (  commands.c:1891  )   trace:<br>

> stonith_send_async_reply:        Reply   <st-reply st_origin="bug1"<br>

> t="stonith-ng" st_op="st_fence" st_device_id="ustonith:0"<br>

> st_remote_op="39b1f1e0-b76f-4d<wbr>25-bd15-77b956c914a0"<br>

> st_clientid="823e92da-8470-491<wbr>a-b662-215526cced22"<br>

> st_clientname="crmd.3973" st_target="bug1" st_device_action="st_fence"<br>

> st_callid="3" st_callopt="0" st_rc="0" st_output="Chassis Power Control:<br>

> Reset\nChassis Power Control: Down/Off\nChassis Power Control: Down/Off\nC<br>

> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:636   )   trace:<br>

> send_cluster_text:       Queueing CPG message 9 to all (1041 bytes, 449<br>

> bytes payload): <st-reply st_origin="bug1" t="stonith-ng"<br>

> st_op="st_notify" st_device_id="ustonith:0"<br>

> st_remote_op="39b1f1e0-b76f-4d<wbr>25-bd15-77b956c914a0"<br>

> st_clientid="823e92da-8470-491<wbr>a-b662-215526cced22" st_clientna<br>

> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:207   )   trace:<br>

> send_cpg_iov:    Queueing CPG message 9 (1041 bytes)<br>

> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:170   )   trace:<br>

> crm_cs_flush:    CPG message sent, size=1041<br>

> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:185   )   trace:<br>

> crm_cs_flush:    Sent 1 CPG messages  (0 remaining, last=9): OK (1)<br>

><br>

> But I see no further action from stonith-ng until minutes later;<br>

> specifically, I don't see remote_op_done run, so the dead node is still<br>

> an 'online (unclean)' member of the array and no failover can take place.<br>

><br>

> When the token expires (yes, we use a very long token), I see the following:<br>

><br>

> Mar 13 07:22:11 [6466] bug0 stonith-ng: (membership.c:1018  )  notice:<br>

> crm_update_peer_state_iter:      Node bug1 state is now lost | nodeid=2<br>

> previous=member source=crm_update_peer_proc<br>

> Mar 13 07:22:11 [6466] bug0 stonith-ng: (      main.c:1275  )   debug:<br>

> st_peer_update_callback: Broadcasting our uname because of node 2<br>

> Mar 13 07:22:11 [6466] bug0 stonith-ng: (       cpg.c:636   )   trace:<br>

> send_cluster_text:       Queueing CPG message 10 to all (666 bytes, 74<br>

> bytes payload): <stonith_command __name__="stonith_command"<br>

> t="stonith-ng" st_op="poke"/><br>

> ...<br>

> Mar 13 07:22:11 [6466] bug0 stonith-ng: (  commands.c:2582  )   debug:<br>

> stonith_command: Processing st_notify reply 0 from bug0 (               0)<br>

> Mar 13 07:22:11 [6466] bug0 stonith-ng: (    remote.c:1945  )   debug:<br>

> process_remote_stonith_exec:     Marking call to poweroff for bug1 on<br>

> behalf of crmd.3973@39b1f1e0-b76f-4d25-b<wbr>d15-77b956c914a0.bug1: OK (0)<br>

><br>

> and the STONITH command is finally communicated back to crmd as complete<br>

> and failover commences.<br>

><br>

> Is this delay a feature of the cpg_mcast_joined function?  If I<br>

> understand correctly (unlikely), it looks like cpg_mcast_joined is not<br>

> completing because one of the nodes in the group is missing, but I<br>

> haven't looked at that code closely yet.  Is it advisable to have<br>

> stonith-ng broadcast a membership change when it successfully fences a node?<br>

><br>

> Attaching logs with PCMK_debug=stonith-ng<br>

> and PCMK_trace_functions=stonith_s<wbr>end_async_reply,send_cluster_t<wbr>ext,send_cpg_iov,crm_cs_flush<br>

><br>

> Thanks in advance,<br>

> Chris<br>

<br>

</div></div>Can you share your full pacemaker config (please obfuscate passwords).<br>

<br>

--<br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.com/w/" rel="noreferrer" target="_blank">https://alteeve.com/w/</a><br>

"I am, somehow, less interested in the weight and convolutions of<br>

Einstein’s brain than in the near certainty that people of equal talent<br>

have lived and died in cotton fields and sweatshops." - Stephen Jay Gould<br>

<br>

______________________________<wbr>_________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>

<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/m<wbr>ailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc<wbr>/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div><br></div></div>