[Pacemaker] configuration variants for 2 node cluster

Mon Jun 23 10:36:32 EDT 2014

Hi Digimer,

Suppose I disabled to cluster on start up, but what about remaining node,
if I need to reboot it?
So, even in case of connection lost between these two nodes I need to have
one node working and providing resources.
How did you solve this situation?
Should it be a separate daemon which checks somehow connection between the
two nodes and decides to run corosync and pacemaker or to keep them down?

Thank you,
Kostya

On Mon, Jun 23, 2014 at 4:34 PM, Digimer <lists at alteeve.ca> wrote:

> On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:
>
>> Hi guys,
>> I want to gather all possible configuration variants for 2-node cluster,
>> because it has a lot of pitfalls and there are not a lot of information
>> across the internet about it. And also I have some questions about
>> configurations and their specific problems.
>> VARIANT 1:
>> -----------------
>> We can use "two_node" and "wait_for_all" option from Corosync's
>> votequorum, and set up fencing agents with delay on one of them.
>> Here is a workflow(diagram) of this configuration:
>> 1. Node start.
>> 2. Cluster start (Corosync and Pacemaker) at the boot time.
>> 3. Wait for all nodes. All nodes joined?
>>      No. Go to step 3.
>>      Yes. Go to step 4.
>> 4. Start resources.
>> 5. Split brain situation (something with connection between nodes).
>> 6. Fencing agent on the one of the nodes reboots the other node (there
>> is a configured delay on one of the Fencing agents).
>> 7. Rebooted node go to step 1.
>> There are two (or more?) important things in this configuration:
>> 1. Rebooted node remains waiting for all nodes to be visible (connection
>> should be restored).
>> 2. Suppose connection problem still exists and the node which rebooted
>> the other guy has to be rebooted also (for some reasons). After reboot
>> he is also stuck on step 3 because of connection problem.
>> QUESTION:
>> -----------------
>> Is it possible somehow to assign to the guy who won the reboot race
>> (rebooted other guy) a status like a "primary" and allow him not to wait
>> for all nodes after reboot. And neglect this status after other node
>> joined this one.
>> So is it possible?
>> Right now that's the only configuration I know for 2 node cluster.
>> Other variants are very appreciated =)
>> VARIANT 2 (not implemented, just a suggestion):
>> -----------------
>> I've been thinking about using external SSD drive (or other external
>> drive). So for example fencing agent can reserve SSD using SCSI command
>> and after that reboot the other node.
>> The main idea of this is the first node, as soon as a cluster starts on
>> it, reserves SSD till the other node joins the cluster, after that SCSI
>> reservation is removed.
>> 1. Node start
>> 2. Cluster start (Corosync and Pacemaker) at the boot time.
>> 3. Reserve SSD. Did it manage to reserve?
>>      No. Don't start resources (Wait for all).
>>      Yes. Go to step 4.
>> 4. Start resources.
>> 5. Remove SCSI reservation when the other node has joined.
>> 5. Split brain situation (something with connection between nodes).
>> 6. Fencing agent tries to reserve SSD. Did it manage to reserve?
>>      No. Maybe puts node in standby mode ...
>>      Yes. Reboot the other node.
>> 7. Optional: a single node can keep SSD reservation till he is alone in
>> the cluster or till his shut-down.
>> I am really looking forward to find the best solution (or a couple of
>> them =)).
>> Hope I am not the only person ho is interested in this topic.
>>
>>
>> Thank you,
>> Kostya
>>
>
> Hi Kostya,
>
>   I only build 2-node clusters, and I've not had problems with this going
> back to 2009 over dozens of clusters. The tricks I found are:
>
> * Disable quorum (of course)
> * Setup good fencing, and add a delay to the node you you prefer (or pick
> one at random, if equal value) to avoid dual-fences
> * Disable to cluster on start up, to prevent fence loops.
>
>   That's it. With this, your 2-node cluster will be just fine.
>
>   As for your question; Once a node is fenced successfully, the resource
> manager (pacemaker) will take over any services lost on the fenced node, if
> that is how you configured it. A node the either gracefully leaves or
> dies/fenced should not interfere with the remaining node.
>
>   The problem is when a node vanishes and fencing fails. Then, not knowing
> what the other node might be doing, the only safe option is to block,
> otherwise you risk a split-brain. This is why fencing is so important.
>
> Cheers
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140623/e7c7aede/attachment-0003.html>