<div dir="ltr">Thanks for your response Ken. I'm puzzled ... in my case node remain UNCLEAN (offline) until dc-deadtime expires, even when both nodes are up and corosync is quorate.<div><br></div><div>I see the following from crmd when I have dc-deadtime=2min</div><div><br></div><div><div>Dec 15 21:34:33 max04 crmd[13791]: notice: Quorum acquired</div><div>Dec 15 21:34:33 max04 crmd[13791]: notice: pcmk_quorum_notification: Node max04[2886730248] - state is now member (was (null))<br></div><div>Dec 15 21:34:33 max04 crmd[13791]: notice: pcmk_quorum_notification: Node (null)[2886730249] - state is now member (was (null))<br></div><div>Dec 15 21:34:33 max04 crmd[13791]: notice: Notifications disabled</div><div>Dec 15 21:34:33 max04 crmd[13791]: notice: The local CRM is operational</div><div>Dec 15 21:34:33 max04 crmd[13791]: notice: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]</div><div>...</div><div>Dec 15 21:36:33 max05 crmd[10365]: warning: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING</div><div>Dec 15 21:36:33 max05 crmd[10365]: notice: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED origin=election_timeout_popped ]</div><div>Dec 15 21:36:33 max05 crmd[10365]: warning: FSA: Input I_ELECTION_DC from do_election_check() received in state S_INTEGRATION</div><div>Dec 15 21:36:33 max05 crmd[10365]: notice: Notifications disabled</div><div>Dec 15 21:36:33 max04 crmd[13791]: notice: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]</div></div><div><br></div><div>only after this do the nodes transition to Online. This is using the vanilla RHEL7.2 cluster stack and the following options:</div><div><br></div><div><div>property cib-bootstrap-options: \</div><div> no-quorum-policy=ignore \</div><div> default-action-timeout=120s \</div><div> pe-warn-series-max=1500 \</div><div> pe-input-series-max=1500 \</div><div> pe-error-series-max=1500 \</div><div> stonith-action=poweroff \</div><div> stonith-timeout=900 \</div><div> dc-deadtime=2min \</div><div> maintenance-mode=false \</div><div> have-watchdog=false \</div><div> dc-version=1.1.13-10.el7-44eb2dd \</div><div> cluster-infrastructure=corosync</div><div><div class="gmail_extra"><br></div><div class="gmail_extra">Thanks again,</div><div class="gmail_extra">Chris</div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 15, 2016 at 3:26 PM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 12/15/2016 02:00 PM, Chris Walker wrote:<br>
> Hello,<br>
><br>
> I have a quick question about dc-deadtime. I believe that Digimer and<br>
> others on this list might have already addressed this, but I want to<br>
> make sure I'm not missing something.<br>
><br>
> If my understanding is correct, dc-deadtime sets the amount of time that<br>
> must elapse before a cluster is formed (DC is elected, etc), regardless<br>
> of which nodes have joined the cluster. In other words, even if all<br>
> nodes that are explicitly enumerated in the nodelist section have<br>
> started Pacemaker, they will still wait dc-deadtime before forming a<br>
> cluster.<br>
><br>
> In my case, I have a two-node cluster on which I'd like to allow a<br>
> pretty long time (~5 minutes) for both nodes to join before giving up on<br>
> them. However, if they both join quickly, I'd like to proceed to form a<br>
> cluster immediately; I don't want to wait for the full five minutes to<br>
> elapse before forming a cluster. Further, if a node doesn't respond<br>
> within five minutes, I want to fence it and start resources on the node<br>
> that is up.<br>
<br>
</span>Pacemaker+corosync behaves as you describe by default.<br>
<br>
dc-deadtime is how long to wait for an election to finish, but if the<br>
election finishes sooner than that (i.e. a DC is elected), it stops<br>
waiting. It doesn't even wait for all nodes, just a quorum.<br>
<br>
Also, with startup-fencing=true (the default), any unseen nodes will be<br>
fenced, and the remaining nodes will proceed to host resources. Of<br>
course, it needs quorum for this, too.<br>
<br>
With two nodes, quorum is handled specially, but that's a different topic.<br>
<span><br>
> With Pacemaker/Heartbeat, the initdead parameter did exactly what I<br>
> want, but I don't see any way to do this with Pacemaker/Corosync. From<br>
> reading other posts, it looks like people use an external agent to start<br>
> HA daemons once nodes are up ... is this a correct understanding?<br>
><br>
> Thanks very much,<br>
> Chris<br>
<br>
</span>______________________________<wbr>_________________<br>
Users mailing list: <a href="mailto:Users@clusterlabs.org" target="_blank">Users@clusterlabs.org</a><br>
<a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.org/m<wbr>ailman/listinfo/users</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc<wbr>/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>
</blockquote></div><br></div></div></div></div>