[Pacemaker] configuration variants for 2 node cluster

Kostiantyn Ponomarenko konstantin.ponomarenko at gmail.com
Mon Jun 23 10:36:32 EDT 2014

Hi Digimer,

Suppose I disabled to cluster on start up, but what about remaining node,
if I need to reboot it?
So, even in case of connection lost between these two nodes I need to have
one node working and providing resources.
How did you solve this situation?
Should it be a separate daemon which checks somehow connection between the
two nodes and decides to run corosync and pacemaker or to keep them down?

Thank you,

On Mon, Jun 23, 2014 at 4:34 PM, Digimer <lists at alteeve.ca> wrote:

> On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
>> Hi guys,
>> I want to gather all possible configuration variants for 2-node cluster,
>> because it has a lot of pitfalls and there are not a lot of information
>> across the internet about it. And also I have some questions about
>> configurations and their specific problems.
>> -----------------
>> We can use "two_node" and "wait_for_all" option from Corosync's
>> votequorum, and set up fencing agents with delay on one of them.
>> Here is a workflow(diagram) of this configuration:
>> 1. Node start.
>> 2. Cluster start (Corosync and Pacemaker) at the boot time.
>> 3. Wait for all nodes. All nodes joined?
>>      No. Go to step 3.
>>      Yes. Go to step 4.
>> 4. Start resources.
>> 5. Split brain situation (something with connection between nodes).
>> 6. Fencing agent on the one of the nodes reboots the other node (there
>> is a configured delay on one of the Fencing agents).
>> 7. Rebooted node go to step 1.
>> There are two (or more?) important things in this configuration:
>> 1. Rebooted node remains waiting for all nodes to be visible (connection
>> should be restored).
>> 2. Suppose connection problem still exists and the node which rebooted
>> the other guy has to be rebooted also (for some reasons). After reboot
>> he is also stuck on step 3 because of connection problem.
>> -----------------
>> Is it possible somehow to assign to the guy who won the reboot race
>> (rebooted other guy) a status like a "primary" and allow him not to wait
>> for all nodes after reboot. And neglect this status after other node
>> joined this one.
>> So is it possible?
>> Right now that's the only configuration I know for 2 node cluster.
>> Other variants are very appreciated =)
>> VARIANT 2 (not implemented, just a suggestion):
>> -----------------
>> I've been thinking about using external SSD drive (or other external
>> drive). So for example fencing agent can reserve SSD using SCSI command
>> and after that reboot the other node.
>> The main idea of this is the first node, as soon as a cluster starts on
>> it, reserves SSD till the other node joins the cluster, after that SCSI
>> reservation is removed.
>> 1. Node start
>> 2. Cluster start (Corosync and Pacemaker) at the boot time.
>> 3. Reserve SSD. Did it manage to reserve?
>>      No. Don't start resources (Wait for all).
>>      Yes. Go to step 4.
>> 4. Start resources.
>> 5. Remove SCSI reservation when the other node has joined.
>> 5. Split brain situation (something with connection between nodes).
>> 6. Fencing agent tries to reserve SSD. Did it manage to reserve?
>>      No. Maybe puts node in standby mode ...
>>      Yes. Reboot the other node.
>> 7. Optional: a single node can keep SSD reservation till he is alone in
>> the cluster or till his shut-down.
>> I am really looking forward to find the best solution (or a couple of
>> them =)).
>> Hope I am not the only person ho is interested in this topic.
>> Thank you,
>> Kostya
> Hi Kostya,
>   I only build 2-node clusters, and I've not had problems with this going
> back to 2009 over dozens of clusters. The tricks I found are:
> * Disable quorum (of course)
> * Setup good fencing, and add a delay to the node you you prefer (or pick
> one at random, if equal value) to avoid dual-fences
> * Disable to cluster on start up, to prevent fence loops.
>   That's it. With this, your 2-node cluster will be just fine.
>   As for your question; Once a node is fenced successfully, the resource
> manager (pacemaker) will take over any services lost on the fenced node, if
> that is how you configured it. A node the either gracefully leaves or
> dies/fenced should not interfere with the remaining node.
>   The problem is when a node vanishes and fencing fails. Then, not knowing
> what the other node might be doing, the only safe option is to block,
> otherwise you risk a split-brain. This is why fencing is so important.
> Cheers
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140623/e7c7aede/attachment-0003.html>

More information about the Pacemaker mailing list