<p dir="ltr">Finally we added a third node and fencing works great this way when one of them fails. I had to use no-quorum-policy set to freeze for this configuration  on SLES 11 SP3.</p>

<div class="gmail_quote">On Jun 10, 2015 9:21 AM, &quot;Digimer&quot; &lt;<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt; wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/06/15 04:11 AM, Jonathan Vargas wrote:<br>

&gt; Thanks Digimer,<br>

&gt;<br>

&gt; I read an old post where you mention the configuration. However after<br>

&gt; adding &quot;start-delay=15&quot; to my stonith resource, yet both nodes reboot at<br>

&gt; the same time on network disconnect.<br>

<br>

Not &#39;start-delay&#39;, just &#39;delay&#39;.<br>

<br>

&gt; This is my current configuration after the &quot;start-delay&quot; change:<br>

&gt;<br>

&gt; <a href="http://i.imgur.com/1o5bGvj.png" rel="noreferrer" target="_blank">http://i.imgur.com/1o5bGvj.png</a><br>

&gt;<br>

&gt; And this is the status of the cluster:<br>

&gt;<br>

&gt; <a href="http://i.imgur.com/TJNsHVD.png" rel="noreferrer" target="_blank">http://i.imgur.com/TJNsHVD.png</a><br>

&gt;<br>

&gt; I don&#39;t have a hardware stonith device, so I think linux watchdog is<br>

&gt; being used.  Is ok that the stonith resource be placed on a single node?<br>

<br>

I&#39;ve not used it.<br>

<br>

The test though is to see if the fencing workings when you crash each<br>

machine (echo c &gt; /proc/sysrq-trigger) and when the machine is alive,<br>

but the network is failed.<br>

<br>

&gt; Any idea about what should I fix?<br>

&gt;<br>

&gt; Thanks in advance.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; 2015-06-10 0:27 GMT-06:00 Digimer &lt;<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a><br>

&gt; &lt;mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt;&gt;:<br>

&gt;<br>

&gt;     On 10/06/15 01:50 AM, Jonathan Vargas wrote:<br>

&gt;     &gt;<br>

&gt;     &gt; 2015-06-09 23:26 GMT-06:00 Digimer &lt;<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a> &lt;mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt;<br>

&gt;     &gt; &lt;mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a> &lt;mailto:<a href="mailto:lists@alteeve.ca">lists@alteeve.ca</a>&gt;&gt;&gt;:<br>

&gt;     &gt;<br>

&gt;     &gt;     On 10/06/15 01:19 AM, Jonathan Vargas wrote:<br>

&gt;     &gt;     &gt; Thanks Andrei, Digimer.<br>

&gt;     &gt;     &gt;<br>

&gt;     &gt;     &gt; I see. Since I need to address this discussion to a<br>

&gt;     definitive solution,<br>

&gt;     &gt;     &gt; I am sharing you a diagram of how we are designing this HA<br>

&gt;     architecture,<br>

&gt;     &gt;     &gt; to clarify the problem we are trying to solve:<br>

&gt;     &gt;     &gt;<br>

&gt;     &gt;     &gt; <a href="http://i.imgur.com/BFPcZSx.png" rel="noreferrer" target="_blank">http://i.imgur.com/BFPcZSx.png</a><br>

&gt;     &gt;<br>

&gt;     &gt;     Last block is DRBD. If DRBD will be managed by the cluster, it<br>

&gt;     must have<br>

&gt;     &gt;     fencing.<br>

&gt;     &gt;<br>

&gt;     &gt;     This is your definitive answer.<br>

&gt;     &gt;<br>

&gt;     &gt;     Without it, you *will* get a split-brain. That leads to, at<br>

&gt;     best, data<br>

&gt;     &gt;     divergence or data loss.<br>

&gt;     &gt;<br>

&gt;     &gt;     &gt; The first layer, Load Balancer; and the third later,<br>

&gt;     Database, are both<br>

&gt;     &gt;     &gt; already setup. The Load Balancer cluster uses only an VIP<br>

&gt;     resource,<br>

&gt;     &gt;     &gt; while Database cluster uses DRBD+VIP resources. They are on<br>

&gt;     production<br>

&gt;     &gt;     &gt; and work fine, test passed :-)<br>

&gt;     &gt;     &gt;<br>

&gt;     &gt;     &gt; Now we are handling the Web Server layer, which I am<br>

&gt;     discussing with<br>

&gt;     &gt;     &gt; experts like you. These servers require to be all active and<br>

&gt;     see the<br>

&gt;     &gt;     &gt; same data for read &amp; write, as quickly as possible, mainly<br>

&gt;     reads.<br>

&gt;     &gt;     &gt;<br>

&gt;     &gt;     &gt; *So, If we stay with OCFS2: *Since we need to protect the<br>

&gt;     service<br>

&gt;     &gt;     &gt; availability and keep most of nodes up, what choices do I<br>

&gt;     have to avoid<br>

&gt;     &gt;     &gt; reboots on both Web nodes caused by a split-brain situation<br>

&gt;     when one of<br>

&gt;     &gt;     &gt; them is disconnected from network?<br>

&gt;     &gt;<br>

&gt;     &gt;     None of this matters relative to the importance of working, tested<br>

&gt;     &gt;     fencing for replicated storage.<br>

&gt;     &gt;<br>

&gt;     &gt;     In any HA setup, the reboot of a node should matter not. If<br>

&gt;     you are<br>

&gt;     &gt;     afraid of rebooting a node, you need to reconsider your design.<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; Well, the problem is caused by a pretty common scenario: A simple<br>

&gt;     &gt; network disconnection on node 1 causes both nodes to reboot, even when<br>

&gt;     &gt; the node 1 is still offline, it will keep rebooting the active node 2.<br>

&gt;     &gt; There were no disk issues, but the service availability was lost.<br>

&gt;     &gt; *That&#39;s the main complain now :-/*<br>

&gt;<br>

&gt;     This is a symptom of a configuration issue. It is a separate topic for<br>

&gt;     using/not using fencing.<br>

&gt;<br>

&gt;     First, don&#39;t start the cluster when the node boots.<br>

&gt;<br>

&gt;     A node will boot for one of two reasons only;<br>

&gt;<br>

&gt;     1. Node was fenced; You don&#39;t want it back into the cluster until you<br>

&gt;     know it is safe to do so.<br>

&gt;<br>

&gt;     2. Scheduled maintenance; A human is there, so rejoining it after the<br>

&gt;     maintenance is over is a non-issue.<br>

&gt;<br>

&gt;     This solves the fence-on-boot issue. Also, corosync&#39;s wait_for_all<br>

&gt;     should be used to further protect against this.<br>

&gt;<br>

&gt;     If the problem is that both fence before they die, then set a delay<br>

&gt;     against a node to give it a head-start in fencing the peer. I find<br>

&gt;     delay=&quot;15&quot; to be a good value.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Okay. It will solve the problem about one node fencing the other one<br>

&gt; after reboots. But it will require manual intervention to make the<br>

&gt; service available again.<br>

&gt;<br>

&gt; What if I disable fencing at all, and I keep syncing a local copy of the<br>

&gt; data on each node&#39;s own disk.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;     &gt;     &gt; Correct me if I&#39;m wrong:<br>

&gt;     &gt;     &gt;<br>

&gt;     &gt;     &gt; *1. Redundant Channel:* This is pretty difficult, since we would<br>

&gt;     &gt;     have to<br>

&gt;     &gt;     &gt; add two new physical netword cards to the virtual machine hosts, and<br>

&gt;     &gt;     &gt; that changes network configuration a lot in the virtualization platform.<br>

&gt;     &gt;<br>

&gt;     &gt;     High Availability must put priorities like hassle and cost second to<br>

&gt;     &gt;     what makes a system more resilient. If you choose not to spend the extra<br>

&gt;     &gt;     money or time, then you must accept the risks.<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;     &gt; *2. Three Node Cluster:* This is possible, but it will consume more<br>

&gt;     &gt;     &gt; resources. We can have it only for cluster communication though, not for<br>

&gt;     &gt;     &gt; web processing, that will decrease load.<br>

&gt;     &gt;<br>

&gt;     &gt;     Quorum is NOT a substitution for fencing. They solve different problems.<br>

&gt;     &gt;<br>

&gt;     &gt;     Quorum is a tool for when all nodes are behaving properly. Fencing is a<br>

&gt;     &gt;     tool for when a node is not behaving properly.<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; Yes, but by adding a 3rd node, it will help to determine which node<br>

&gt;     &gt; could be failing and which are not, to fence the proper one. Right?<br>

&gt;<br>

&gt;     If you have a 3rd node and you fail the network on one, then in theory,<br>

&gt;     yes it will help. In practice, if you down the network on one node, it<br>

&gt;     won&#39;t be able to fence the other node anyway and will be the fence<br>

&gt;     victim.<br>

&gt;<br>

&gt;     &gt;     &gt; *3. Disable Fencing:* You said this should not happen at all if we<br>

&gt;     &gt;     use a<br>

&gt;     &gt;     &gt; shared disk like OCFS. So I am discarding it.<br>

&gt;     &gt;<br>

&gt;     &gt;     Correct.<br>

&gt;     &gt;<br>

&gt;     &gt;     &gt; *4. Use NFS: *Yes, this will cause a SPoF, and to solve it we<br>

&gt;     &gt;     would have<br>

&gt;     &gt;     &gt; to setup another cluster with DRBD as described here<br>

&gt;     &gt;     &gt;<br>

&gt;     &gt;     &lt;<a href="https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html" rel="noreferrer" target="_blank">https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html</a>&gt;,<br>

&gt;     &gt;     &gt; and add more infrastructure resources, or do we can setup NFS over OCFS2?<br>

&gt;     &gt;<br>

&gt;     &gt;     ... Which would require fencing anyway, so you gain nothing but another<br>

&gt;     &gt;     layer of things to break. First rule of HA; Keep it simple.<br>

&gt;     &gt;<br>

&gt;     &gt;     Complexity is the enemy of availability.<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt;<br>

&gt;     &gt; Sure, fencing must be added to if this would be the case.<br>

&gt;<br>

&gt;     Fencing is always needed in HA clusters, full stop.<br>

&gt;<br>

&gt;<br>

&gt;     --<br>

&gt;     Digimer<br>

&gt;     Papers and Projects: <a href="https://alteeve.ca/w/" rel="noreferrer" target="_blank">https://alteeve.ca/w/</a><br>

&gt;     What if the cure for cancer is trapped in the mind of a person without<br>

&gt;     access to education?<br>

&gt;<br>

&gt;     _______________________________________________<br>

&gt;     Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a> &lt;mailto:<a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>&gt;<br>

&gt;     <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt;<br>

&gt;     Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

&gt;     Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt;     Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

&gt; <a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

&gt;<br>

&gt; Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

&gt; Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

&gt; Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

&gt;<br>

<br>

<br>

--<br>

Digimer<br>

Papers and Projects: <a href="https://alteeve.ca/w/" rel="noreferrer" target="_blank">https://alteeve.ca/w/</a><br>

What if the cure for cancer is trapped in the mind of a person without<br>

access to education?<br>

<br>

_______________________________________________<br>

Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

<a href="http://clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" rel="noreferrer" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div>