<div dir="ltr">Hi Digimer,<div><br></div><div>Suppose I disabled to cluster on start up, but what about remaining node, if I need to reboot it?</div><div>So, even in case of connection lost between these two nodes I need to have one node working and providing resources.</div>


<div>How did you solve this situation?</div><div>Should it be a separate daemon which checks somehow connection between the two nodes and decides to run corosync and pacemaker or to keep them down?</div></div><div class="gmail_extra">


<br clear="all"><div><div dir="ltr">Thank you,<div>Kostya</div></div></div>

<br><br><div class="gmail_quote">On Mon, Jun 23, 2014 at 4:34 PM, Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="HOEnZb"><div class="h5">On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi guys,<br>

I want to gather all possible configuration variants for 2-node cluster,<br>

because it has a lot of pitfalls and there are not a lot of information<br>

across the internet about it. And also I have some questions about<br>

configurations and their specific problems.<br>

VARIANT 1:<br>

-----------------<br>

We can use "two_node" and "wait_for_all" option from Corosync's<br>

votequorum, and set up fencing agents with delay on one of them.<br>

Here is a workflow(diagram) of this configuration:<br>

1. Node start.<br>

2. Cluster start (Corosync and Pacemaker) at the boot time.<br>

3. Wait for all nodes. All nodes joined?<br>

     No. Go to step 3.<br>

     Yes. Go to step 4.<br>

4. Start resources.<br>

5. Split brain situation (something with connection between nodes).<br>

6. Fencing agent on the one of the nodes reboots the other node (there<br>

is a configured delay on one of the Fencing agents).<br>

7. Rebooted node go to step 1.<br>

There are two (or more?) important things in this configuration:<br>

1. Rebooted node remains waiting for all nodes to be visible (connection<br>

should be restored).<br>

2. Suppose connection problem still exists and the node which rebooted<br>

the other guy has to be rebooted also (for some reasons). After reboot<br>

he is also stuck on step 3 because of connection problem.<br>

QUESTION:<br>

-----------------<br>

Is it possible somehow to assign to the guy who won the reboot race<br>

(rebooted other guy) a status like a "primary" and allow him not to wait<br>

for all nodes after reboot. And neglect this status after other node<br>

joined this one.<br>

So is it possible?<br>

Right now that's the only configuration I know for 2 node cluster.<br>

Other variants are very appreciated =)<br>

VARIANT 2 (not implemented, just a suggestion):<br>

-----------------<br>

I've been thinking about using external SSD drive (or other external<br>

drive). So for example fencing agent can reserve SSD using SCSI command<br>

and after that reboot the other node.<br>

The main idea of this is the first node, as soon as a cluster starts on<br>

it, reserves SSD till the other node joins the cluster, after that SCSI<br>

reservation is removed.<br>

1. Node start<br>

2. Cluster start (Corosync and Pacemaker) at the boot time.<br>

3. Reserve SSD. Did it manage to reserve?<br>

     No. Don't start resources (Wait for all).<br>

     Yes. Go to step 4.<br>

4. Start resources.<br>

5. Remove SCSI reservation when the other node has joined.<br>

5. Split brain situation (something with connection between nodes).<br>

6. Fencing agent tries to reserve SSD. Did it manage to reserve?<br>

     No. Maybe puts node in standby mode ...<br>

     Yes. Reboot the other node.<br>

7. Optional: a single node can keep SSD reservation till he is alone in<br>

the cluster or till his shut-down.<br>

I am really looking forward to find the best solution (or a couple of<br>

them =)).<br>

Hope I am not the only person ho is interested in this topic.<br>

<br>

<br>

Thank you,<br>

Kostya<br>

</blockquote>

<br></div></div>

Hi Kostya,<br>

<br>

  I only build 2-node clusters, and I've not had problems with this going back to 2009 over dozens of clusters. The tricks I found are:<br>

<br>

* Disable quorum (of course)<br>

* Setup good fencing, and add a delay to the node you you prefer (or pick one at random, if equal value) to avoid dual-fences<br>

* Disable to cluster on start up, to prevent fence loops.<br>

<br>

  That's it. With this, your 2-node cluster will be just fine.<br>

<br>

  As for your question; Once a node is fenced successfully, the resource manager (pacemaker) will take over any services lost on the fenced node, if that is how you configured it. A node the either gracefully leaves or dies/fenced should not interfere with the remaining node.<br>


<br>

  The problem is when a node vanishes and fencing fails. Then, not knowing what the other node might be doing, the only safe option is to block, otherwise you risk a split-brain. This is why fencing is so important.<br>

<br>

Cheers<span class="HOEnZb"><font color="#888888"><br>

<br>

-- <br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>

What if the cure for cancer is trapped in the mind of a person without access to education?<br>

<br>

______________________________<u></u>_________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/<u></u>mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/<u></u>doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</font></span></blockquote></div><br></div>