<div dir="ltr">Hi Digimer,<div><br></div><div>Suppose I disabled to cluster on start up, but what about remaining node, if I need to reboot it?</div><div>So, even in case of connection lost between these two nodes I need to have one node working and providing resources.</div>
<div>How did you solve this situation?</div><div>Should it be a separate daemon which checks somehow connection between the two nodes and decides to run corosync and pacemaker or to keep them down?</div></div><div class="gmail_extra">
<br clear="all"><div><div dir="ltr">Thank you,<div>Kostya</div></div></div>
<br><br><div class="gmail_quote">On Mon, Jun 23, 2014 at 4:34 PM, Digimer <span dir="ltr"><<a href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5">On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi guys,<br>
I want to gather all possible configuration variants for 2-node cluster,<br>
because it has a lot of pitfalls and there are not a lot of information<br>
across the internet about it. And also I have some questions about<br>
configurations and their specific problems.<br>
VARIANT 1:<br>
-----------------<br>
We can use "two_node" and "wait_for_all" option from Corosync's<br>
votequorum, and set up fencing agents with delay on one of them.<br>
Here is a workflow(diagram) of this configuration:<br>
1. Node start.<br>
2. Cluster start (Corosync and Pacemaker) at the boot time.<br>
3. Wait for all nodes. All nodes joined?<br>
No. Go to step 3.<br>
Yes. Go to step 4.<br>
4. Start resources.<br>
5. Split brain situation (something with connection between nodes).<br>
6. Fencing agent on the one of the nodes reboots the other node (there<br>
is a configured delay on one of the Fencing agents).<br>
7. Rebooted node go to step 1.<br>
There are two (or more?) important things in this configuration:<br>
1. Rebooted node remains waiting for all nodes to be visible (connection<br>
should be restored).<br>
2. Suppose connection problem still exists and the node which rebooted<br>
the other guy has to be rebooted also (for some reasons). After reboot<br>
he is also stuck on step 3 because of connection problem.<br>
QUESTION:<br>
-----------------<br>
Is it possible somehow to assign to the guy who won the reboot race<br>
(rebooted other guy) a status like a "primary" and allow him not to wait<br>
for all nodes after reboot. And neglect this status after other node<br>
joined this one.<br>
So is it possible?<br>
Right now that's the only configuration I know for 2 node cluster.<br>
Other variants are very appreciated =)<br>
VARIANT 2 (not implemented, just a suggestion):<br>
-----------------<br>
I've been thinking about using external SSD drive (or other external<br>
drive). So for example fencing agent can reserve SSD using SCSI command<br>
and after that reboot the other node.<br>
The main idea of this is the first node, as soon as a cluster starts on<br>
it, reserves SSD till the other node joins the cluster, after that SCSI<br>
reservation is removed.<br>
1. Node start<br>
2. Cluster start (Corosync and Pacemaker) at the boot time.<br>
3. Reserve SSD. Did it manage to reserve?<br>
No. Don't start resources (Wait for all).<br>
Yes. Go to step 4.<br>
4. Start resources.<br>
5. Remove SCSI reservation when the other node has joined.<br>
5. Split brain situation (something with connection between nodes).<br>
6. Fencing agent tries to reserve SSD. Did it manage to reserve?<br>
No. Maybe puts node in standby mode ...<br>
Yes. Reboot the other node.<br>
7. Optional: a single node can keep SSD reservation till he is alone in<br>
the cluster or till his shut-down.<br>
I am really looking forward to find the best solution (or a couple of<br>
them =)).<br>
Hope I am not the only person ho is interested in this topic.<br>
<br>
<br>
Thank you,<br>
Kostya<br>
</blockquote>
<br></div></div>
Hi Kostya,<br>
<br>
I only build 2-node clusters, and I've not had problems with this going back to 2009 over dozens of clusters. The tricks I found are:<br>
<br>
* Disable quorum (of course)<br>
* Setup good fencing, and add a delay to the node you you prefer (or pick one at random, if equal value) to avoid dual-fences<br>
* Disable to cluster on start up, to prevent fence loops.<br>
<br>
That's it. With this, your 2-node cluster will be just fine.<br>
<br>
As for your question; Once a node is fenced successfully, the resource manager (pacemaker) will take over any services lost on the fenced node, if that is how you configured it. A node the either gracefully leaves or dies/fenced should not interfere with the remaining node.<br>
<br>
The problem is when a node vanishes and fencing fails. Then, not knowing what the other node might be doing, the only safe option is to block, otherwise you risk a split-brain. This is why fencing is so important.<br>
<br>
Cheers<span class="HOEnZb"><font color="#888888"><br>
<br>
-- <br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" target="_blank">https://alteeve.ca/w/</a><br>
What if the cure for cancer is trapped in the mind of a person without access to education?<br>
<br>
______________________________<u></u>_________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/<u></u>mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/<u></u>doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</font></span></blockquote></div><br></div>