[ClusterLabs] Two-Node OCFS2 cluster keep rebooting each other

Jonathan Vargas jonathan.vargas at alkaid.cr
Wed Jun 10 01:19:48 EDT 2015

Thanks Andrei, Digimer.

I see. Since I need to address this discussion to a definitive solution, I
am sharing you a diagram of how we are designing this HA architecture, to
clarify the problem we are trying to solve:


The first layer, Load Balancer; and the third later, Database, are both
already setup. The Load Balancer cluster uses only an VIP resource, while
Database cluster uses DRBD+VIP resources. They are on production and work
fine, test passed :-)

Now we are handling the Web Server layer, which I am discussing with
experts like you. These servers require to be all active and see the same
data for read & write, as quickly as possible, mainly reads.

*So, If we stay with OCFS2: *Since we need to protect the service
availability and keep most of nodes up, what choices do I have to avoid
reboots on both Web nodes caused by a split-brain situation when one of
them is disconnected from network?

Correct me if I'm wrong:

*1. Redundant Channel:* This is pretty difficult, since we would have to
add two new physical netword cards to the virtual machine hosts, and that
changes network configuration a lot in the virtualization platform.

*2. Three Node Cluster:* This is possible, but it will consume more
resources. We can have it only for cluster communication though, not for
web processing, that will decrease load.

*3. Disable Fencing:* You said this should not happen at all if we use a
shared disk like OCFS. So I am discarding it.

*4. Use NFS: *Yes, this will cause a SPoF, and to solve it we would have to
setup another cluster with DRBD as described here
and add more infrastructure resources, or do we can setup NFS over OCFS2?

Thanks in advance.

2015-06-09 22:03 GMT-06:00 Andrei Borzenkov <arvidjaar at gmail.com>:

> В Tue, 9 Jun 2015 21:53:41 -0600
> Jonathan Vargas <jonathan.vargas at alkaid.cr> пишет:
> > Thanks,
> >
> > Those nodes do not need coordination between them. They have been working
> > so far until now without HA and OCFS2. A load balancer distributes the
> > requests between both nodes, they do not know about the existence of each
> > other.
> >
> > However, they do require shared storage to work with the same data.
> Before
> > setting up the OCFS2 cluster, we have been syncing disks using rsync, but
> > it syncs each minute, not real time.
> >
> > So, our requirement would depend on OCFS2, and it works, but not of an HA
> > and stonith setup I think. I see no way how it could add value to the
> > required solution. Or it does?
> >
> You need coordination between nodes on write and even if you mount your
> system read-only you still have at least boot time journal replay. So
> no, your nodes cannot free run.
> You probably want to use NFS for this.
