[ClusterLabs] Data Centers and Pacemaker

Thu Nov 14 22:23:35 UTC 2024

On Thu, 2024-11-14 at 21:37 +0000, Angelo M Ruggiero via Users wrote:
> Hello,
> 
> I got a long message. I am not sure if really appropriate for the
> pacemaker group but you guys got a lot of experience... Happy to be
> told not appropiate.
> 
> We have two data centers connected by multitple dark fibre about 13km
> apart. Latency is about 0.2-0.3 ms. Forget if round trip or one way.
> I am not convinced having an extra fibre via an independent provided
> to make redundant connection between the two sites would be
> financially possible. Plus there is a view we should not rely on such
> things. We would like to move to more "modern" tech that has
> clustering builtin, commodity hardware and at somepoint a 3rd data
> center.
> 
> Initially was thinking just to move floating ip addresses between the
> two sites and run synchronous db syncs (its a SAP installation but
> not that relevant i think).  But this view/approach does not working
> anymore. Then I realised as a company we should not rely as said
> above.
> 
> So instead we would DB sync within a a site and async between sites. 
> 
> To do this is a bit more in complexity as i need to have odd number
> of nodes in each side and ensure have no single point of failure on
> both sides)
> As an optimisation would be ok one to be be mini with out single
> point of failure and just switch back to the main site if you know I
> mean in the case of failover/take over to the mini site.
> (main site would be then 5 nodes in a pacemaker cluster and the mini
> site 3 nodes in the cluster)
> 
> I do not worry about scaling out we can just add nodes 2 at time at
> both sites.
> 
> Failover  in the site would be automated using pacemaker eventually
> and planned takeovers we could do between sites by telling pacemaker
> manually on both sides what to do. Obviously we test this all out,
> certified etc. 
> 
> I hope i can use the term failover i.e unplanned and takeover
> planned. 🙂 Our initial goal is reduced planned downtime to zero (w
> e do not have that now for upgrades and patching etc) and to move to
> RPO 0 and minimal RTO.
> 
> As we do not have real redudnant networks being dependent on quorum
> devices is not so good as if the quorum device is lost the whole
> cluster goes down. And as I understand it you can only have one
> quorum device. So thats a SPOF. So instead i have odd numbers of
> nodes in the pacemaker cluster in each datacenter. For me thats ok
> and somehow i think better than quorum devices.
> 
> We use Vmware (sigh...) and NetApp
> 
> In terms of fencing we are trying to fence using industry standards
> e.g not going to the management console of vmware. But more standard
> protocols e.g in shared storage.  I think I can make a good case for
> self fencing using watchdog as I understand this is the minimal that
> SBD needs. I found that statement on a page on the old clusterlabs
> website i have not looked at the new.
> 
> So what  are my questions
> Am I right the quorum device is a single point of failure? Just out
> of interest

No, the quorum device simply acts as an additional quorum vote. If you
have two nodes plus a quorum device, any two of them is sufficient for
quorum.

>  If we ever want to some how automate or semi automate using Booth
> between data centeres, is this a good idea. I looked a bit for
> documentation on booth, I should harder. But from gut feel is Booth
> possible. Is there any alternative. 

With a fast connection, you would be fine going with either a stretched
cluster (all nodes in both sites are in the same cluster) or a
multisite cluster (two separate clusters coordinated via booth).

With a stretched cluster, you need at least one server at each site.
With a multisite cluster, you need at least two, with some fencing
mechanism they can use to shoot each other (or sbd).

The problems in either case are:

* Quorum: you need a third site, but it can just have a quorum device
and nothing else. It could even be a cloud host. But without a third
site, neither of the others could know which one should take over if
they lose communication. (Quorum isn't as important within one site, as
long as fencing is configured properly, but it's a nice plus.)

* Fencing can't rely on the connection between the two sites, which
pretty much limits it to watchdog-based sbd.

> Is watchdog only fencing using sbd the absolute mininum.

Yes, that's a good choice when you don't have redundant paths. The side
that loses quorum will shut itself down. The side that retains quorum
will wait the configured time for that to happen then recover
resources.

> Would you recommend in addition to watchdog to do resource fencing
> i.e take the storage away, pull the ethernet cable away virtually
> (not sure how that works though). Or just node fencing in addition to
> watchdog via some defined way.

Watchdog is enough. Fabric fencing is problematic in a stretched or
multisite cluster. You could use power or fabric fencing within each
site, just for the nodes in that site to fence each other, but you'd
still need sbd to deal with the other site, so you might as well stick
to that. The only real advantage to two fencing devices is that the
power or fabric fencing might be quicker than the watchdog, so recovery
could be quicker for single-node failures within one site.

> Using the shared storage in sbd, fot poison pills does that given me
> really anything. I cant justify to myself if it does. Does is give
> anything else except poison pills?
> Have I forgotten a topic 😉
> Sorry for typos and grammar mistakes, it is late over her.
> 
> regards
> Angelo
> 
-- 
Ken Gaillot <kgaillot at redhat.com>