[ClusterLabs] STONITH not communicated back to initiator until token expires

Mon Mar 13 13:35:27 EDT 2017

On 13/03/17 12:07 PM, Chris Walker wrote:
> Hello,
> 
> On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync:
> 2.4.0-4; libqb: 1.0-1),
> it looks like successful STONITH operations are not communicated from
> stonith-ng back to theinitiator (in this case, crmd) until the STONITHed
> node is removed from the cluster when
> Corosync notices that it's gone (i.e., after the token timeout).

Others might have more useful info, but my understanding of a lost node
sequence is this;

1. Node stops responding, corosync declares it lost after token timeout
2. Corosync reforms the cluster with remaining node(s), checks if it is
quorate (always true in 2-node)
3. Corosync informs Pacemaker of the membership change.
4. Pacemaker invokes stonith, waits for the fence agent to return
"success" (exit code of the agent as per the FenceAgentAPI
[https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md]). If
the method fails, it moves on to the next method. If all methods fail,
it goes back to the first method and tries again, looping indefinitely.

> In trace debug logs, I see the STONITH reply sent via the
> cpg_mcast_joined (libqb) function in crm_cs_flush
> (stonith_send_async_reply->send_cluster_text->send_cluster_text->send_cpg_iov->crm_cs_flush->cpg_mcast_joined)
> 
> Mar 13 07:18:22 [6466] bug0 stonith-ng: (  commands.c:1891  )   trace:
> stonith_send_async_reply:        Reply   <st-reply st_origin="bug1"
> t="stonith-ng" st_op="st_fence" st_device_id="ustonith:0"
> st_remote_op="39b1f1e0-b76f-4d25-bd15-77b956c914a0"
> st_clientid="823e92da-8470-491a-b662-215526cced22"
> st_clientname="crmd.3973" st_target="bug1" st_device_action="st_fence"
> st_callid="3" st_callopt="0" st_rc="0" st_output="Chassis Power Control:
> Reset\nChassis Power Control: Down/Off\nChassis Power Control: Down/Off\nC
> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:636   )   trace:
> send_cluster_text:       Queueing CPG message 9 to all (1041 bytes, 449
> bytes payload): <st-reply st_origin="bug1" t="stonith-ng"
> st_op="st_notify" st_device_id="ustonith:0"
> st_remote_op="39b1f1e0-b76f-4d25-bd15-77b956c914a0"
> st_clientid="823e92da-8470-491a-b662-215526cced22" st_clientna
> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:207   )   trace:
> send_cpg_iov:    Queueing CPG message 9 (1041 bytes)
> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:170   )   trace:
> crm_cs_flush:    CPG message sent, size=1041
> Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:185   )   trace:
> crm_cs_flush:    Sent 1 CPG messages  (0 remaining, last=9): OK (1)
> 
> But I see no further action from stonith-ng until minutes later;
> specifically, I don't see remote_op_done run, so the dead node is still
> an 'online (unclean)' member of the array and no failover can take place.
> 
> When the token expires (yes, we use a very long token), I see the following:
> 
> Mar 13 07:22:11 [6466] bug0 stonith-ng: (membership.c:1018  )  notice:
> crm_update_peer_state_iter:      Node bug1 state is now lost | nodeid=2
> previous=member source=crm_update_peer_proc
> Mar 13 07:22:11 [6466] bug0 stonith-ng: (      main.c:1275  )   debug:
> st_peer_update_callback: Broadcasting our uname because of node 2
> Mar 13 07:22:11 [6466] bug0 stonith-ng: (       cpg.c:636   )   trace:
> send_cluster_text:       Queueing CPG message 10 to all (666 bytes, 74
> bytes payload): <stonith_command __name__="stonith_command"
> t="stonith-ng" st_op="poke"/>
> ...
> Mar 13 07:22:11 [6466] bug0 stonith-ng: (  commands.c:2582  )   debug:
> stonith_command: Processing st_notify reply 0 from bug0 (               0)
> Mar 13 07:22:11 [6466] bug0 stonith-ng: (    remote.c:1945  )   debug:
> process_remote_stonith_exec:     Marking call to poweroff for bug1 on
> behalf of crmd.3973 at 39b1f1e0-b76f-4d25-bd15-77b956c914a0.bug1: OK (0)
> 
> and the STONITH command is finally communicated back to crmd as complete
> and failover commences.
> 
> Is this delay a feature of the cpg_mcast_joined function?  If I
> understand correctly (unlikely), it looks like cpg_mcast_joined is not
> completing because one of the nodes in the group is missing, but I
> haven't looked at that code closely yet.  Is it advisable to have
> stonith-ng broadcast a membership change when it successfully fences a node?
> 
> Attaching logs with PCMK_debug=stonith-ng
> and PCMK_trace_functions=stonith_send_async_reply,send_cluster_text,send_cpg_iov,crm_cs_flush
> 
> Thanks in advance,
> Chris

Can you share your full pacemaker config (please obfuscate passwords).

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould