[ClusterLabs] pacemaker geo redundancy - 2 nodes

Tue Jul 16 04:10:00 EDT 2019

On 7/15/19 9:57 PM, Ken Gaillot wrote:
> On Mon, 2019-07-15 at 12:10 +0530, Rohit Saini wrote:
>> Hi All,
>>
>> I know pacemaker booth is being used for geographical redundancy.
>> Currently I am using pacemaker/corosync for my local two-node
>> redundancy.
>> As I understand, booth needs atleast 3 nodes to work correctly to do
>> the automatic failovers. So it does not fit my requirements.
> Booth needs a third site, but it doesn't need to be a cluster node. It
> can be a lightweight host running just the booth arbitrator.
>
> However you do need full clusters at each site, so at least two nodes
> at each site, plus the arbitrator host.
>
>> Few queries are
>> 1) Can I make use of my current pacemaker/corosync to make it work
>> across my two geographical nodes for automatic failovers, considering
>> I am ready to ignore split-brain scenarios. I may need to tweak some
>> timers I believe. Is this approach possible?
> Yes, this is sometimes referred to as a "stretched" or "metro" cluster.
> You can raise the corosync token timeout as needed to cover typical
> latencies. However this is generally only recommended when the
> connection between the two sites is highly reliable and low latency.
>
> A lightweight host at a third site running qdevice can be used to
> provide true quorum.
If you don't have a host at a third site where you can run
qdevice (or even booth) but there is a third site with some
disk-sharing-service available you might think about using
that as shared-disk for SBD (storage based death).
>
>> 2) Any disadvantages of going this way?
> Raising the token timeout will delay the response to actual
> node/network failures by the same amount.
>
> If you're thinking of doing it without fencing, the consequences of
> split brain depend on your workload. Something like a database or
> cluster filesystem could become horribly corrupted.
The provocative answer heard on this list in one or the other
way usually is, that if after thinking about it you still don't
feel as if you need fencing you probably don't need a cluster ;-)

As loosing one site might imply loosing contact to the fencing
device, associated with the node there, a watchdog-solution
using SBD might be considered.

Klaus
>>
>> Thanks,
>> Rohit