[ClusterLabs] question about dc-deadtime

Mon Jan 9 18:55:10 EST 2017

On Fri, Dec 16, 2016 at 8:52 AM, Chris Walker
<christopher.walker at gmail.com> wrote:
> Thanks for your response Ken.  I'm puzzled ... in my case node remain
> UNCLEAN (offline) until dc-deadtime expires, even when both nodes are up and
> corosync is quorate.

I'm guessing you're starting both nodes at the same time?
The behaviour you're seeing is arguably a hangover from the multicast
days (in which case corosync wouldn't have had a node list).

But since that's not the common case anymore, we could probably
shortcut the timeout if we know the complete node list and see that
they are all online.

>
> I see the following from crmd when I have dc-deadtime=2min
>
> Dec 15 21:34:33 max04 crmd[13791]:   notice: Quorum acquired
> Dec 15 21:34:33 max04 crmd[13791]:   notice: pcmk_quorum_notification: Node
> max04[2886730248] - state is now member (was (null))
> Dec 15 21:34:33 max04 crmd[13791]:   notice: pcmk_quorum_notification: Node
> (null)[2886730249] - state is now member (was (null))
> Dec 15 21:34:33 max04 crmd[13791]:   notice: Notifications disabled
> Dec 15 21:34:33 max04 crmd[13791]:   notice: The local CRM is operational
> Dec 15 21:34:33 max04 crmd[13791]:   notice: State transition S_STARTING ->
> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> ...
> Dec 15 21:36:33 max05 crmd[10365]:  warning: FSA: Input I_DC_TIMEOUT from
> crm_timer_popped() received in state S_PENDING
> Dec 15 21:36:33 max05 crmd[10365]:   notice: State transition S_ELECTION ->
> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED
> origin=election_timeout_popped ]
> Dec 15 21:36:33 max05 crmd[10365]:  warning: FSA: Input I_ELECTION_DC from
> do_election_check() received in state S_INTEGRATION
> Dec 15 21:36:33 max05 crmd[10365]:   notice: Notifications disabled
> Dec 15 21:36:33 max04 crmd[13791]:   notice: State transition S_PENDING ->
> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
> origin=do_cl_join_finalize_respond ]
>
> only after this do the nodes transition to Online.  This is using the
> vanilla RHEL7.2 cluster stack and the following options:
>
> property cib-bootstrap-options: \
>         no-quorum-policy=ignore \
>         default-action-timeout=120s \
>         pe-warn-series-max=1500 \
>         pe-input-series-max=1500 \
>         pe-error-series-max=1500 \
>         stonith-action=poweroff \
>         stonith-timeout=900 \
>         dc-deadtime=2min \
>         maintenance-mode=false \
>         have-watchdog=false \
>         dc-version=1.1.13-10.el7-44eb2dd \
>         cluster-infrastructure=corosync
>
> Thanks again,
> Chris
>
> On Thu, Dec 15, 2016 at 3:26 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>> On 12/15/2016 02:00 PM, Chris Walker wrote:
>> > Hello,
>> >
>> > I have a quick question about dc-deadtime.  I believe that Digimer and
>> > others on this list might have already addressed this, but I want to
>> > make sure I'm not missing something.
>> >
>> > If my understanding is correct, dc-deadtime sets the amount of time that
>> > must elapse before a cluster is formed (DC is elected, etc), regardless
>> > of which nodes have joined the cluster.  In other words, even if all
>> > nodes that are explicitly enumerated in the nodelist section have
>> > started Pacemaker, they will still wait dc-deadtime before forming a
>> > cluster.
>> >
>> > In my case, I have a two-node cluster on which I'd like to allow a
>> > pretty long time (~5 minutes) for both nodes to join before giving up on
>> > them.  However, if they both join quickly, I'd like to proceed to form a
>> > cluster immediately; I don't want to wait for the full five minutes to
>> > elapse before forming a cluster.  Further, if a node doesn't respond
>> > within five minutes, I want to fence it and start resources on the node
>> > that is up.
>>
>> Pacemaker+corosync behaves as you describe by default.
>>
>> dc-deadtime is how long to wait for an election to finish, but if the
>> election finishes sooner than that (i.e. a DC is elected), it stops
>> waiting. It doesn't even wait for all nodes, just a quorum.
>>
>> Also, with startup-fencing=true (the default), any unseen nodes will be
>> fenced, and the remaining nodes will proceed to host resources. Of
>> course, it needs quorum for this, too.
>>
>> With two nodes, quorum is handled specially, but that's a different topic.
>>
>> > With Pacemaker/Heartbeat, the initdead parameter did exactly what I
>> > want, but I don't see any way to do this with Pacemaker/Corosync.  From
>> > reading other posts, it looks like people use an external agent to start
>> > HA daemons once nodes are up ... is this a correct understanding?
>> >
>> > Thanks very much,
>> > Chris
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>