[Pacemaker] configuration variants for 2 node cluster

Kostiantyn Ponomarenko konstantin.ponomarenko at gmail.com
Tue Jun 24 04:36:32 EDT 2014


Hi Chrissie,

But wait_for_all doesn't help when there is no connection between the nodes.
Because in case I need to reboot the remaining working node I won't get
working cluster after that - both nodes will be waiting connection between
them.
That's why I am looking for the solution which could help me to get one
node working in this situation (after reboot).
I've been thinking about some kind of marker which could help a node to
determine a state of the other node.
Like external disk and SCSI reservation command. Maybe you could suggest
another kind of marker?
I am not sure can we use a presents of a file on external SSD as the
marker. Kind of: if there is a file - the other node is alive, if no - node
is dead.

Digimer,

Thanks for the links and information.
Anyway if I go this way, I will write my own daemon to determine a state of
the other node.
Also the information about fence loop is new for me, thanks =)

Thank you,
Kostya


On Tue, Jun 24, 2014 at 10:55 AM, Christine Caulfield <ccaulfie at redhat.com>
wrote:

> On 23/06/14 15:49, Digimer wrote:
>
>> Hi Kostya,
>>
>>    I'm having a little trouble understanding your question, sorry.
>>
>>    On boot, the node will not start anything, so after booting it, you
>> log in, check that it can talk to the peer node (a simple ping is
>> generally enough), then start the cluster. It will join the peer's
>> existing cluster (even if it's a cluster on just itself).
>>
>>    If you booted both nodes, say after a power outage, you will check
>> the connection (again, a simple ping is fine) and then start the cluster
>> on both nodes at the same time.
>>
>
>
> wait_for_all helps with most of these situations. If a node goes down then
> it won't start services until it's seen the non-failed node because
> wait_for_all prevents a newly rebooted node from doing anything on its own.
> This also takes care of the case where both nodes are rebooted together of
> course, because that's the same as a new start.
>
> Chrissie
>
>
>     If one of the nodes needs to be shut down, say for repairs or
>> upgrades, you migrate the services off of it and over to the peer node,
>> then you stop the cluster (which tells the peer that the node is leaving
>> the cluster). After that, the remaining node operates by itself. When
>> you turn it back on, you rejoin the cluster and migrate the services back.
>>
>>    I think, maybe, you are looking at things more complicated than you
>> need to. Pacemaker and corosync will handle most of this for you, once
>> setup properly. What operating system do you plan to use, and what
>> cluster stack? I suspect it will be corosync + pacemaker, which should
>> work fine.
>>
>> digimer
>>
>> On 23/06/14 10:36 AM, Kostiantyn Ponomarenko wrote:
>>
>>> Hi Digimer,
>>>
>>> Suppose I disabled to cluster on start up, but what about remaining
>>> node, if I need to reboot it?
>>> So, even in case of connection lost between these two nodes I need to
>>> have one node working and providing resources.
>>> How did you solve this situation?
>>> Should it be a separate daemon which checks somehow connection between
>>> the two nodes and decides to run corosync and pacemaker or to keep them
>>> down?
>>>
>>> Thank you,
>>> Kostya
>>>
>>>
>>> On Mon, Jun 23, 2014 at 4:34 PM, Digimer <lists at alteeve.ca
>>> <mailto:lists at alteeve.ca>> wrote:
>>>
>>>     On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
>>>
>>>         Hi guys,
>>>         I want to gather all possible configuration variants for 2-node
>>>         cluster,
>>>         because it has a lot of pitfalls and there are not a lot of
>>>         information
>>>         across the internet about it. And also I have some questions
>>> about
>>>         configurations and their specific problems.
>>>         VARIANT 1:
>>>         -----------------
>>>         We can use "two_node" and "wait_for_all" option from Corosync's
>>>         votequorum, and set up fencing agents with delay on one of them.
>>>         Here is a workflow(diagram) of this configuration:
>>>         1. Node start.
>>>         2. Cluster start (Corosync and Pacemaker) at the boot time.
>>>         3. Wait for all nodes. All nodes joined?
>>>               No. Go to step 3.
>>>               Yes. Go to step 4.
>>>         4. Start resources.
>>>         5. Split brain situation (something with connection between
>>> nodes).
>>>         6. Fencing agent on the one of the nodes reboots the other node
>>>         (there
>>>         is a configured delay on one of the Fencing agents).
>>>         7. Rebooted node go to step 1.
>>>         There are two (or more?) important things in this configuration:
>>>         1. Rebooted node remains waiting for all nodes to be visible
>>>         (connection
>>>         should be restored).
>>>         2. Suppose connection problem still exists and the node which
>>>         rebooted
>>>         the other guy has to be rebooted also (for some reasons). After
>>>         reboot
>>>         he is also stuck on step 3 because of connection problem.
>>>         QUESTION:
>>>         -----------------
>>>         Is it possible somehow to assign to the guy who won the reboot
>>> race
>>>         (rebooted other guy) a status like a "primary" and allow him not
>>>         to wait
>>>         for all nodes after reboot. And neglect this status after
>>> other node
>>>         joined this one.
>>>         So is it possible?
>>>         Right now that's the only configuration I know for 2 node
>>> cluster.
>>>         Other variants are very appreciated =)
>>>         VARIANT 2 (not implemented, just a suggestion):
>>>         -----------------
>>>         I've been thinking about using external SSD drive (or other
>>> external
>>>         drive). So for example fencing agent can reserve SSD using SCSI
>>>         command
>>>         and after that reboot the other node.
>>>         The main idea of this is the first node, as soon as a cluster
>>>         starts on
>>>         it, reserves SSD till the other node joins the cluster, after
>>>         that SCSI
>>>         reservation is removed.
>>>         1. Node start
>>>         2. Cluster start (Corosync and Pacemaker) at the boot time.
>>>         3. Reserve SSD. Did it manage to reserve?
>>>               No. Don't start resources (Wait for all).
>>>               Yes. Go to step 4.
>>>         4. Start resources.
>>>         5. Remove SCSI reservation when the other node has joined.
>>>         5. Split brain situation (something with connection between
>>> nodes).
>>>         6. Fencing agent tries to reserve SSD. Did it manage to reserve?
>>>               No. Maybe puts node in standby mode ...
>>>               Yes. Reboot the other node.
>>>         7. Optional: a single node can keep SSD reservation till he is
>>>         alone in
>>>         the cluster or till his shut-down.
>>>         I am really looking forward to find the best solution (or a
>>>         couple of
>>>         them =)).
>>>         Hope I am not the only person ho is interested in this topic.
>>>
>>>
>>>         Thank you,
>>>         Kostya
>>>
>>>
>>>     Hi Kostya,
>>>
>>>        I only build 2-node clusters, and I've not had problems with this
>>>     going back to 2009 over dozens of clusters. The tricks I found are:
>>>
>>>     * Disable quorum (of course)
>>>     * Setup good fencing, and add a delay to the node you you prefer (or
>>>     pick one at random, if equal value) to avoid dual-fences
>>>     * Disable to cluster on start up, to prevent fence loops.
>>>
>>>        That's it. With this, your 2-node cluster will be just fine.
>>>
>>>        As for your question; Once a node is fenced successfully, the
>>>     resource manager (pacemaker) will take over any services lost on the
>>>     fenced node, if that is how you configured it. A node the either
>>>     gracefully leaves or dies/fenced should not interfere with the
>>>     remaining node.
>>>
>>>        The problem is when a node vanishes and fencing fails. Then, not
>>>     knowing what the other node might be doing, the only safe option is
>>>     to block, otherwise you risk a split-brain. This is why fencing is
>>>     so important.
>>>
>>>     Cheers
>>>
>>>     --
>>>     Digimer
>>>     Papers and Projects: https://alteeve.ca/w/
>>>     What if the cure for cancer is trapped in the mind of a person
>>>     without access to education?
>>>
>>>     _________________________________________________
>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>     http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
>>>     <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>>>
>>>     Project Home: http://www.clusterlabs.org
>>>     Getting started:
>>>     http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
>>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>     Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140624/25dbcf62/attachment-0003.html>


More information about the Pacemaker mailing list