<div dir="ltr"><div><div>Hi guys,</div><div> </div><div>I want to gather all possible configuration variants for 2-node cluster, because it has a lot of pitfalls and there are not a lot of information across the internet about it. And also I have some questions about configurations and their specific problems.</div>
<div> </div><div>VARIANT 1:</div><div>-----------------</div><div>We can use "two_node" and "wait_for_all" option from Corosync's votequorum, and set up fencing agents with delay on one of them.</div>
<div> </div><div>Here is a workflow(diagram) of this configuration:</div><div> </div><div>1. Node start.</div><div>2. Cluster start (Corosync and Pacemaker) at the boot time.</div><div>3. Wait for all nodes. All nodes joined?</div>
<div> No. Go to step 3.</div><div> Yes. Go to step 4.</div><div>4. Start resources.</div><div>5. Split brain situation (something with connection between nodes).</div><div>6. Fencing agent on the one of the nodes reboots the other node (there is a configured delay on one of the Fencing agents).</div>
<div>7. Rebooted node go to step 1.</div><div> </div><div>There are two (or more?) important things in this configuration:</div><div>1. Rebooted node remains waiting for all nodes to be visible (connection should be restored).</div>
<div>2. Suppose connection problem still exists and the node which rebooted the other guy has to be rebooted also (for some reasons). After reboot he is also stuck on step 3 because of connection problem.</div><div> </div>
<div>QUESTION:</div><div>-----------------</div><div>Is it possible somehow to assign to the guy who won the reboot race (rebooted other guy) a status like a "primary" and allow him not to wait for all nodes after reboot. And neglect this status after other node joined this one.</div>
<div>So is it possible?</div><div> </div><div>Right now that's the only configuration I know for 2 node cluster.</div><div>Other variants are very appreciated =)</div><div> </div><div>VARIANT 2 (not implemented, just a suggestion):</div>
<div>-----------------</div><div>I've been thinking about using external SSD drive (or other external drive). So for example fencing agent can reserve SSD using SCSI command and after that reboot the other node.</div>
<div> </div><div>The main idea of this is the first node, as soon as a cluster starts on it, reserves SSD till the other node joins the cluster, after that SCSI reservation is removed.</div><div> </div><div>1. Node start </div>
<div>2. Cluster start (Corosync and Pacemaker) at the boot time.</div><div>3. Reserve SSD. Did it manage to reserve?</div><div> No. Don't start resources (Wait for all).</div><div> Yes. Go to step 4.</div><div>
4. Start resources.</div>
<div>5. Remove SCSI reservation when the other node has joined.</div><div>5. Split brain situation (something with connection between nodes).</div><div>6. Fencing agent tries to reserve SSD. Did it manage to reserve?</div>
<div> No. Maybe puts node in standby mode ... </div><div> Yes. Reboot the other node.</div><div>7. Optional: a single node can keep SSD reservation till he is alone in the cluster or till his shut-down.</div><div> </div>
<div>I am really looking forward to find the best solution (or a couple of them =)).</div><div>Hope I am not the only person ho is interested in this topic.</div></div><div><br></div><br clear="all"><div><div dir="ltr">Thank you,<div>
Kostya</div></div></div>
</div>