[ClusterLabs] question about dc-deadtime

Thu Dec 15 22:52:22 CET 2016

Thanks for your response Ken.  I'm puzzled ... in my case node remain
UNCLEAN (offline) until dc-deadtime expires, even when both nodes are up
and corosync is quorate.

I see the following from crmd when I have dc-deadtime=2min

Dec 15 21:34:33 max04 crmd[13791]:   notice: Quorum acquired
Dec 15 21:34:33 max04 crmd[13791]:   notice: pcmk_quorum_notification: Node
max04[2886730248] - state is now member (was (null))
Dec 15 21:34:33 max04 crmd[13791]:   notice: pcmk_quorum_notification: Node
(null)[2886730249] - state is now member (was (null))
Dec 15 21:34:33 max04 crmd[13791]:   notice: Notifications disabled
Dec 15 21:34:33 max04 crmd[13791]:   notice: The local CRM is operational
Dec 15 21:34:33 max04 crmd[13791]:   notice: State transition S_STARTING ->
S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
...
Dec 15 21:36:33 max05 crmd[10365]:  warning: FSA: Input I_DC_TIMEOUT from
crm_timer_popped() received in state S_PENDING
Dec 15 21:36:33 max05 crmd[10365]:   notice: State transition S_ELECTION ->
S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED
origin=election_timeout_popped ]
Dec 15 21:36:33 max05 crmd[10365]:  warning: FSA: Input I_ELECTION_DC from
do_election_check() received in state S_INTEGRATION
Dec 15 21:36:33 max05 crmd[10365]:   notice: Notifications disabled
Dec 15 21:36:33 max04 crmd[13791]:   notice: State transition S_PENDING ->
S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]

only after this do the nodes transition to Online.  This is using the
vanilla RHEL7.2 cluster stack and the following options:

property cib-bootstrap-options: \
        no-quorum-policy=ignore \
        default-action-timeout=120s \
        pe-warn-series-max=1500 \
        pe-input-series-max=1500 \
        pe-error-series-max=1500 \
        stonith-action=poweroff \
        stonith-timeout=900 \
        dc-deadtime=2min \
        maintenance-mode=false \
        have-watchdog=false \
        dc-version=1.1.13-10.el7-44eb2dd \
        cluster-infrastructure=corosync

Thanks again,
Chris

On Thu, Dec 15, 2016 at 3:26 PM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 12/15/2016 02:00 PM, Chris Walker wrote:
> > Hello,
> >
> > I have a quick question about dc-deadtime.  I believe that Digimer and
> > others on this list might have already addressed this, but I want to
> > make sure I'm not missing something.
> >
> > If my understanding is correct, dc-deadtime sets the amount of time that
> > must elapse before a cluster is formed (DC is elected, etc), regardless
> > of which nodes have joined the cluster.  In other words, even if all
> > nodes that are explicitly enumerated in the nodelist section have
> > started Pacemaker, they will still wait dc-deadtime before forming a
> > cluster.
> >
> > In my case, I have a two-node cluster on which I'd like to allow a
> > pretty long time (~5 minutes) for both nodes to join before giving up on
> > them.  However, if they both join quickly, I'd like to proceed to form a
> > cluster immediately; I don't want to wait for the full five minutes to
> > elapse before forming a cluster.  Further, if a node doesn't respond
> > within five minutes, I want to fence it and start resources on the node
> > that is up.
>
> Pacemaker+corosync behaves as you describe by default.
>
> dc-deadtime is how long to wait for an election to finish, but if the
> election finishes sooner than that (i.e. a DC is elected), it stops
> waiting. It doesn't even wait for all nodes, just a quorum.
>
> Also, with startup-fencing=true (the default), any unseen nodes will be
> fenced, and the remaining nodes will proceed to host resources. Of
> course, it needs quorum for this, too.
>
> With two nodes, quorum is handled specially, but that's a different topic.
>
> > With Pacemaker/Heartbeat, the initdead parameter did exactly what I
> > want, but I don't see any way to do this with Pacemaker/Corosync.  From
> > reading other posts, it looks like people use an external agent to start
> > HA daemons once nodes are up ... is this a correct understanding?
> >
> > Thanks very much,
> > Chris
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161215/d9662b2c/attachment.html>