[ClusterLabs] Two-Node OCFS2 cluster keep rebooting each other

Wed Jun 10 05:19:48 UTC 2015

Thanks Andrei, Digimer.

I see. Since I need to address this discussion to a definitive solution, I
am sharing you a diagram of how we are designing this HA architecture, to
clarify the problem we are trying to solve:

http://i.imgur.com/BFPcZSx.png

The first layer, Load Balancer; and the third later, Database, are both
already setup. The Load Balancer cluster uses only an VIP resource, while
Database cluster uses DRBD+VIP resources. They are on production and work
fine, test passed :-)

Now we are handling the Web Server layer, which I am discussing with
experts like you. These servers require to be all active and see the same
data for read & write, as quickly as possible, mainly reads.

*So, If we stay with OCFS2: *Since we need to protect the service
availability and keep most of nodes up, what choices do I have to avoid
reboots on both Web nodes caused by a split-brain situation when one of
them is disconnected from network?

Correct me if I'm wrong:

*1. Redundant Channel:* This is pretty difficult, since we would have to
add two new physical netword cards to the virtual machine hosts, and that
changes network configuration a lot in the virtualization platform.

*2. Three Node Cluster:* This is possible, but it will consume more
resources. We can have it only for cluster communication though, not for
web processing, that will decrease load.

*3. Disable Fencing:* You said this should not happen at all if we use a
shared disk like OCFS. So I am discarding it.

*4. Use NFS: *Yes, this will cause a SPoF, and to solve it we would have to
setup another cluster with DRBD as described here
<https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha_techguides/book_sleha_techguides.html>,
and add more infrastructure resources, or do we can setup NFS over OCFS2?

Thanks in advance.

*Jonathan Vargas Rodríguez*
Founder and Solution Engineer
Alkaid <https://alkaid.cr/> | Open Source Software

* mail *  jonathan.vargas at alkaid.cr
 telf   +506 4001 6259 Ext. 01
 mobi   +506 4001 6259 Ext. 51

<http://linkedin.com/in/jonathanvargas/>
<https://plus.google.com/+JonathanVargas/>
<https://www.facebook.com/alkaid.cr>       <https://twitter.com/alkaidcr>

2015-06-09 22:03 GMT-06:00 Andrei Borzenkov <arvidjaar at gmail.com>:

> В Tue, 9 Jun 2015 21:53:41 -0600
> Jonathan Vargas <jonathan.vargas at alkaid.cr> пишет:
>
> > Thanks,
> >
> > Those nodes do not need coordination between them. They have been working
> > so far until now without HA and OCFS2. A load balancer distributes the
> > requests between both nodes, they do not know about the existence of each
> > other.
> >
> > However, they do require shared storage to work with the same data.
> Before
> > setting up the OCFS2 cluster, we have been syncing disks using rsync, but
> > it syncs each minute, not real time.
> >
> > So, our requirement would depend on OCFS2, and it works, but not of an HA
> > and stonith setup I think. I see no way how it could add value to the
> > required solution. Or it does?
> >
>
> You need coordination between nodes on write and even if you mount your
> system read-only you still have at least boot time journal replay. So
> no, your nodes cannot free run.
>
> You probably want to use NFS for this.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150609/3bf818ce/attachment.htm>