[ClusterLabs] Two-Node OCFS2 cluster keep rebooting each other

Digimer lists at alteeve.ca
Wed Jun 10 15:19:36 UTC 2015


On 10/06/15 04:11 AM, Jonathan Vargas wrote:
> Thanks Digimer,
> 
> I read an old post where you mention the configuration. However after
> adding "start-delay=15" to my stonith resource, yet both nodes reboot at
> the same time on network disconnect.

Not 'start-delay', just 'delay'.

> This is my current configuration after the "start-delay" change:
> 
> http://i.imgur.com/1o5bGvj.png
> 
> And this is the status of the cluster:
> 
> http://i.imgur.com/TJNsHVD.png
> 
> I don't have a hardware stonith device, so I think linux watchdog is
> being used.  Is ok that the stonith resource be placed on a single node?

I've not used it.

The test though is to see if the fencing workings when you crash each
machine (echo c > /proc/sysrq-trigger) and when the machine is alive,
but the network is failed.

> Any idea about what should I fix?
> 
> Thanks in advance.
> 
> 
> 
> 2015-06-10 0:27 GMT-06:00 Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>>:
> 
>     On 10/06/15 01:50 AM, Jonathan Vargas wrote:
>     >
>     > 2015-06-09 23:26 GMT-06:00 Digimer <lists at alteeve.ca <mailto:lists at alteeve.ca>
>     > <mailto:lists at alteeve.ca <mailto:lists at alteeve.ca>>>:
>     >
>     >     On 10/06/15 01:19 AM, Jonathan Vargas wrote:
>     >     > Thanks Andrei, Digimer.
>     >     >
>     >     > I see. Since I need to address this discussion to a
>     definitive solution,
>     >     > I am sharing you a diagram of how we are designing this HA
>     architecture,
>     >     > to clarify the problem we are trying to solve:
>     >     >
>     >     > http://i.imgur.com/BFPcZSx.png
>     >
>     >     Last block is DRBD. If DRBD will be managed by the cluster, it
>     must have
>     >     fencing.
>     >
>     >     This is your definitive answer.
>     >
>     >     Without it, you *will* get a split-brain. That leads to, at
>     best, data
>     >     divergence or data loss.
>     >
>     >     > The first layer, Load Balancer; and the third later,
>     Database, are both
>     >     > already setup. The Load Balancer cluster uses only an VIP
>     resource,
>     >     > while Database cluster uses DRBD+VIP resources. They are on
>     production
>     >     > and work fine, test passed :-)
>     >     >
>     >     > Now we are handling the Web Server layer, which I am
>     discussing with
>     >     > experts like you. These servers require to be all active and
>     see the
>     >     > same data for read & write, as quickly as possible, mainly
>     reads.
>     >     >
>     >     > *So, If we stay with OCFS2: *Since we need to protect the
>     service
>     >     > availability and keep most of nodes up, what choices do I
>     have to avoid
>     >     > reboots on both Web nodes caused by a split-brain situation
>     when one of
>     >     > them is disconnected from network?
>     >
>     >     None of this matters relative to the importance of working, tested
>     >     fencing for replicated storage.
>     >
>     >     In any HA setup, the reboot of a node should matter not. If
>     you are
>     >     afraid of rebooting a node, you need to reconsider your design.
>     >
>     >
>     >
>     > Well, the problem is caused by a pretty common scenario: A simple
>     > network disconnection on node 1 causes both nodes to reboot, even when
>     > the node 1 is still offline, it will keep rebooting the active node 2.
>     > There were no disk issues, but the service availability was lost.
>     > *That's the main complain now :-/*
> 
>     This is a symptom of a configuration issue. It is a separate topic for
>     using/not using fencing.
> 
>     First, don't start the cluster when the node boots.
> 
>     A node will boot for one of two reasons only;
> 
>     1. Node was fenced; You don't want it back into the cluster until you
>     know it is safe to do so.
> 
>     2. Scheduled maintenance; A human is there, so rejoining it after the
>     maintenance is over is a non-issue.
> 
>     This solves the fence-on-boot issue. Also, corosync's wait_for_all
>     should be used to further protect against this.
> 
>     If the problem is that both fence before they die, then set a delay
>     against a node to give it a head-start in fencing the peer. I find
>     delay="15" to be a good value.
> 
> 
> 
> Okay. It will solve the problem about one node fencing the other one
> after reboots. But it will require manual intervention to make the
> service available again.
> 
> What if I disable fencing at all, and I keep syncing a local copy of the
> data on each node's own disk.
> 
> 
>  
> 
>     >     > Correct me if I'm wrong:
>     >     >
>     >     > *1. Redundant Channel:* This is pretty difficult, since we would
>     >     have to
>     >     > add two new physical netword cards to the virtual machine hosts, and
>     >     > that changes network configuration a lot in the virtualization platform.
>     >
>     >     High Availability must put priorities like hassle and cost second to
>     >     what makes a system more resilient. If you choose not to spend the extra
>     >     money or time, then you must accept the risks.
>     >
>     >
>     >     > *2. Three Node Cluster:* This is possible, but it will consume more
>     >     > resources. We can have it only for cluster communication though, not for
>     >     > web processing, that will decrease load.
>     >
>     >     Quorum is NOT a substitution for fencing. They solve different problems.
>     >
>     >     Quorum is a tool for when all nodes are behaving properly. Fencing is a
>     >     tool for when a node is not behaving properly.
>     >
>     >
>     >
>     > Yes, but by adding a 3rd node, it will help to determine which node
>     > could be failing and which are not, to fence the proper one. Right?
> 
>     If you have a 3rd node and you fail the network on one, then in theory,
>     yes it will help. In practice, if you down the network on one node, it
>     won't be able to fence the other node anyway and will be the fence
>     victim.
> 
>     >     > *3. Disable Fencing:* You said this should not happen at all if we
>     >     use a
>     >     > shared disk like OCFS. So I am discarding it.
>     >
>     >     Correct.
>     >
>     >     > *4. Use NFS: *Yes, this will cause a SPoF, and to solve it we
>     >     would have
>     >     > to setup another cluster with DRBD as described here
>     >     >
>     >     <https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html>,
>     >     > and add more infrastructure resources, or do we can setup NFS over OCFS2?
>     >
>     >     ... Which would require fencing anyway, so you gain nothing but another
>     >     layer of things to break. First rule of HA; Keep it simple.
>     >
>     >     Complexity is the enemy of availability.
>     >
>     >
>     >
>     > Sure, fencing must be added to if this would be the case.
> 
>     Fencing is always needed in HA clusters, full stop.
> 
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca/w/
>     What if the cure for cancer is trapped in the mind of a person without
>     access to education?
> 
>     _______________________________________________
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>     http://clusterlabs.org/mailman/listinfo/users
> 
>     Project Home: http://www.clusterlabs.org
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list