[ClusterLabs] STONITH not communicated back to initiator until token expires

Mon Mar 13 23:43:50 EDT 2017

Thanks for your reply Digimer.

On Mon, Mar 13, 2017 at 1:35 PM, Digimer <lists at alteeve.ca> wrote:

> On 13/03/17 12:07 PM, Chris Walker wrote:
> > Hello,
> >
> > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync:
> > 2.4.0-4; libqb: 1.0-1),
> > it looks like successful STONITH operations are not communicated from
> > stonith-ng back to theinitiator (in this case, crmd) until the STONITHed
> > node is removed from the cluster when
> > Corosync notices that it's gone (i.e., after the token timeout).
>
> Others might have more useful info, but my understanding of a lost node
> sequence is this;
>
> 1. Node stops responding, corosync declares it lost after token timeout
> 2. Corosync reforms the cluster with remaining node(s), checks if it is
> quorate (always true in 2-node)
> 3. Corosync informs Pacemaker of the membership change.
> 4. Pacemaker invokes stonith, waits for the fence agent to return
> "success" (exit code of the agent as per the FenceAgentAPI
> [https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md]). If
> the method fails, it moves on to the next method. If all methods fail,
> it goes back to the first method and tries again, looping indefinitely.
>
>
That's roughly my understanding as well for the case when a node suddenly
leaves the cluster (e.g., poweroff), and this case is working as expected
for me.  I'm seeing delays when a node is marked for STONITH while it's
still up (e.g., after a stop operation fails).  In this case, what I expect
to see is something like:
1.  crmd requests that stonith-ng fence the node
2.  stonith-ng (might be a different stonith-ng) fences the node and sends
a message that it has succeeded
3.  stonith-ng (the original from step 1) receives this message and
communicates back to crmd that the node has been fenced

but what I'm seeing is
1.  crmd requests that stonith-ng fence the node
2.  stonith-ng fences the node and sends a message saying that it has
succeeded
3.  nobody hears this message
4.  Corosync eventually realizes that the fenced node is no longer part of
the config and broadcasts a config change
5.  stonith-ng finishes the STONITH operation that was started earlier and
communicates back to crmd that the node has been STONITHed

I'm less convinced that the sending of the STONITH notify in step 2 is at
fault; it also seems possible that a callback is not being run when it
should be.

The Pacemaker configuration is not important (I've seen this happen on our
production clusters and on a small sandbox), but the config is:

primitive bug0-stonith stonith:fence_ipmilan \
        params pcmk_host_list=bug0 ipaddr=bug0-ipmi action=off login=admin
passwd=admin \
        meta target-role=Started
primitive bug1-stonith stonith:fence_ipmilan \
        params pcmk_host_list=bug1 ipaddr=bug1-ipmi action=off login=admin
passwd=admin \
        meta target-role=Started
primitive prm-snmp-heartbeat snmptrap_heartbeat \
        params snmphost=bug0 community=public \
        op monitor interval=10 timeout=300 \
        op start timeout=300 interval=0 \
        op stop timeout=300 interval=0
clone cln-snmp-heartbeat prm-snmp-heartbeat \
        meta interleave=true globally-unique=false ordered=false
notify=false
location bug0-stonith-loc bug0-stonith -inf: bug0
location bug1-stonith-loc bug1-stonith -inf: bug1

The corosync config might be more interesting:

totem {
    version: 2
    crypto_cipher: none
    crypto_hash: none
    secauth: off
    rrp_mode: passive
    transport: udpu
    token: 240000
    consensus: 1000

    interface {
        ringnumber 0
        bindnetaddr: 203.0.113.0
        mcastport: 5405
        ttl: 1
    }
}
nodelist {
        node {
                ring0_addr: 203.0.113.1
                nodeid: 1
        }
        node {
                ring0_addr: 203.0.113.2
                nodeid: 2
        }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
}

> In trace debug logs, I see the STONITH reply sent via the
> > cpg_mcast_joined (libqb) function in crm_cs_flush
> > (stonith_send_async_reply->send_cluster_text->send_cluster_
> text->send_cpg_iov->crm_cs_flush->cpg_mcast_joined)
> >
> > Mar 13 07:18:22 [6466] bug0 stonith-ng: (  commands.c:1891  )   trace:
> > stonith_send_async_reply:        Reply   <st-reply st_origin="bug1"
> > t="stonith-ng" st_op="st_fence" st_device_id="ustonith:0"
> > st_remote_op="39b1f1e0-b76f-4d25-bd15-77b956c914a0"
> > st_clientid="823e92da-8470-491a-b662-215526cced22"
> > st_clientname="crmd.3973" st_target="bug1" st_device_action="st_fence"
> > st_callid="3" st_callopt="0" st_rc="0" st_output="Chassis Power Control:
> > Reset\nChassis Power Control: Down/Off\nChassis Power Control:
> Down/Off\nC
> > Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:636   )   trace:
> > send_cluster_text:       Queueing CPG message 9 to all (1041 bytes, 449
> > bytes payload): <st-reply st_origin="bug1" t="stonith-ng"
> > st_op="st_notify" st_device_id="ustonith:0"
> > st_remote_op="39b1f1e0-b76f-4d25-bd15-77b956c914a0"
> > st_clientid="823e92da-8470-491a-b662-215526cced22" st_clientna
> > Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:207   )   trace:
> > send_cpg_iov:    Queueing CPG message 9 (1041 bytes)
> > Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:170   )   trace:
> > crm_cs_flush:    CPG message sent, size=1041
> > Mar 13 07:18:22 [6466] bug0 stonith-ng: (       cpg.c:185   )   trace:
> > crm_cs_flush:    Sent 1 CPG messages  (0 remaining, last=9): OK (1)
> >
> > But I see no further action from stonith-ng until minutes later;
> > specifically, I don't see remote_op_done run, so the dead node is still
> > an 'online (unclean)' member of the array and no failover can take place.
> >
> > When the token expires (yes, we use a very long token), I see the
> following:
> >
> > Mar 13 07:22:11 [6466] bug0 stonith-ng: (membership.c:1018  )  notice:
> > crm_update_peer_state_iter:      Node bug1 state is now lost | nodeid=2
> > previous=member source=crm_update_peer_proc
> > Mar 13 07:22:11 [6466] bug0 stonith-ng: (      main.c:1275  )   debug:
> > st_peer_update_callback: Broadcasting our uname because of node 2
> > Mar 13 07:22:11 [6466] bug0 stonith-ng: (       cpg.c:636   )   trace:
> > send_cluster_text:       Queueing CPG message 10 to all (666 bytes, 74
> > bytes payload): <stonith_command __name__="stonith_command"
> > t="stonith-ng" st_op="poke"/>
> > ...
> > Mar 13 07:22:11 [6466] bug0 stonith-ng: (  commands.c:2582  )   debug:
> > stonith_command: Processing st_notify reply 0 from bug0 (
>  0)
> > Mar 13 07:22:11 [6466] bug0 stonith-ng: (    remote.c:1945  )   debug:
> > process_remote_stonith_exec:     Marking call to poweroff for bug1 on
> > behalf of crmd.3973 at 39b1f1e0-b76f-4d25-bd15-77b956c914a0.bug1: OK (0)
> >
> > and the STONITH command is finally communicated back to crmd as complete
> > and failover commences.
> >
> > Is this delay a feature of the cpg_mcast_joined function?  If I
> > understand correctly (unlikely), it looks like cpg_mcast_joined is not
> > completing because one of the nodes in the group is missing, but I
> > haven't looked at that code closely yet.  Is it advisable to have
> > stonith-ng broadcast a membership change when it successfully fences a
> node?
> >
> > Attaching logs with PCMK_debug=stonith-ng
> > and PCMK_trace_functions=stonith_send_async_reply,send_cluster_t
> ext,send_cpg_iov,crm_cs_flush
> >
> > Thanks in advance,
> > Chris
>
> Can you share your full pacemaker config (please obfuscate passwords).
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170313/a1ec5634/attachment-0003.html>