[Pacemaker] DC election with downed node in 2-way cluster

Andrew Beekhof andrew at beekhof.net
Wed Jan 13 08:06:45 UTC 2010


On Wed, Jan 13, 2010 at 3:25 AM, Miki Shapiro <Miki.Shapiro at coles.com.au>wrote:

>  Hi all
>
>
>
> I’m attempting to build a 2-way cluster, SLES-11-based with an
> openais/pacemaker stack. I’ve got the nodes and a resource (a drbd volume)
> happening. What I’m not sure about is the active CRM DC election process.
>
>
>
> I configured a null stonith resource for each node.
>
> I have stonith-enabled set to true ( I will implement a real stonith
> facility once final solution is in place)
>
> I have no-quorum-policy set to ignore (as the cluster is expected to work
> with one node active).
>
>
>
> I look at crm_mon or crm_gui, and it’s all green and happy.
>
>
>
> I now go and halt a node.
>

define "halt"


>
>
> Observing crm_mon or crm_gui on node2, I expect to see :
>
> 1.       Services appear as down thanks to resource monitoring directives.
>
> 2.       The quorum broken (… do I care?)
>
> 3.       The new node elected as DC. Despite what the book states (here: <
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-status.html> at the bottom)  that:
>
> *“The DC (Designated Controller) node is where all the decisions are made
> and if the current DC fails a new one is elected from the remaining cluster
> nodes. The choice of DC is of no significance to an administrator beyond the
> fact that its logs will generally be more interesting.”*
>
>
>
> Is of significance. I want the brain, in as far as the surviving node is
> concerned, to be running on a non-halted server.
>
>
>
> What happens in practice is:
>
> If I halt the DC,
>
> 1.       Resources DO appear stopped and do-their-thing™
>
> 2.       [PROBLEM?] Quorum DOES NOT appear as broken
>
> 3.       [PROBLEM?] The remaining node DOES NOT get (visibly) elected as
> the new DC.
>
> If I halted the non-DC node,
>
> 1.       Resources DO appear stopped and do-their-thing™
>
> 2.       Quorum DOES appear as broken
>
> 3.       [PROBLEM?]The remaining node DOES NOT get (visibly) elected as
> the new DC.
>
>
>
> Now if my understanding serves me right, the DC is the baton-holding CRM
> that does the thinking for the entire cluster. If the surviving node1 think
> that the (DEAD) node2 is the de-facto brains of the cluster and doesn’t take
> the reigns, I have a dysfunctional cluster.
>
>
>
> Can someone please offer some clarification on how one would reasonably
> expect this to work?
>

Not without logs (one per scenario as bzip'd attchments please).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100113/22209f43/attachment-0002.htm>


More information about the Pacemaker mailing list