[Pacemaker] DC election with downed node in 2-way cluster

Miki Shapiro Miki.Shapiro at coles.com.au
Wed Jan 13 03:12:48 EST 2010


Halt = soft off - a natively issued poweroff command that shuts stuff down nicely, then powers the blade off.

Logs I'll send tomorrow (our timezone is just wrapping up for the day).

Thanks!

From: Andrew Beekhof [mailto:andrew at beekhof.net]
Sent: Wednesday, 13 January 2010 7:07 PM
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] DC election with downed node in 2-way cluster


On Wed, Jan 13, 2010 at 3:25 AM, Miki Shapiro <Miki.Shapiro at coles.com.au<mailto:Miki.Shapiro at coles.com.au>> wrote:
Hi all

I'm attempting to build a 2-way cluster, SLES-11-based with an openais/pacemaker stack. I've got the nodes and a resource (a drbd volume) happening. What I'm not sure about is the active CRM DC election process.

I configured a null stonith resource for each node.
I have stonith-enabled set to true ( I will implement a real stonith facility once final solution is in place)
I have no-quorum-policy set to ignore (as the cluster is expected to work with one node active).

I look at crm_mon or crm_gui, and it's all green and happy.

I now go and halt a node.

define "halt"


Observing crm_mon or crm_gui on node2, I expect to see :

1.       Services appear as down thanks to resource monitoring directives.

2.       The quorum broken (... do I care?)

3.       The new node elected as DC. Despite what the book states (here: < http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-status.html > at the bottom)  that:

"The DC (Designated Controller) node is where all the decisions are made and if the current DC fails a new one is elected from the remaining cluster nodes. The choice of DC is of no significance to an administrator beyond the fact that its logs will generally be more interesting."



Is of significance. I want the brain, in as far as the surviving node is concerned, to be running on a non-halted server.


What happens in practice is:
If I halt the DC,

1.       Resources DO appear stopped and do-their-thing(tm)

2.       [PROBLEM?] Quorum DOES NOT appear as broken

3.       [PROBLEM?] The remaining node DOES NOT get (visibly) elected as the new DC.
If I halted the non-DC node,

1.       Resources DO appear stopped and do-their-thing(tm)

2.       Quorum DOES appear as broken

3.       [PROBLEM?]The remaining node DOES NOT get (visibly) elected as the new DC.

Now if my understanding serves me right, the DC is the baton-holding CRM that does the thinking for the entire cluster. If the surviving node1 think that the (DEAD) node2 is the de-facto brains of the cluster and doesn't take the reigns, I have a dysfunctional cluster.

Can someone please offer some clarification on how one would reasonably expect this to work?

Not without logs (one per scenario as bzip'd attchments please).

______________________________________________________________________
This email and any attachments may contain privileged and confidential
information and are intended for the named addressee only. If you have
received this e-mail in error, please notify the sender and delete
this e-mail immediately. Any confidentiality, privilege or copyright
is not waived or lost because this e-mail has been sent to you in
error. It is your responsibility to check this e-mail and any
attachments for viruses.  No warranty is made that this material is
free from computer virus or any other defect or error.  Any
loss/damage incurred by using this material is not the sender's
responsibility.  The sender's entire liability will be limited to
resupplying the material.
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100113/c674fb39/attachment-0001.html>


More information about the Pacemaker mailing list