[ClusterLabs] [EXTERNE] Re: Centreon HA Cluster - VIP issue

Fri Sep 8 12:14:34 EDT 2023

Hi Ken,

Thank you for the update and the clarification.
The idea is clear; I just need to know more information about this 2 clusters setup:

1. Arbitrator:
1.1. Only one arbitrator is needed for everything: should I use the Quorum provided by Centreon on the official documentation? Or should I use the booth ticket manager instead?
1.2. is fencing configured separately? Or is is configured during the booth ticket manager installation?

2. Floating IP:
2.1. it doesn't hurt if both Floating IPs are running at the same time right?

3. Fail over:
3.1. How to update the DNS to point to the appropriate IP?
3.2. we're running our own DNS servers; so How to configure booth ticket for just the DNS resource?

4. MariaDB replication:
4.1. How can Centreon MariaDB replicat between the 2 clusters?

5. Centreon:
5.1. Will this setup (2 clusters, 2 floating IPs, 1 booth manager) work for our Centreon project? 

Regards
Adil Bouazzaoui

Adil BOUAZZAOUI
Ingénieur Infrastructures & Technologies
GSM         : +212 703 165 758
E-mail  : adil.bouazzaoui at tmandis.ma

-----Message d'origine-----
De : Ken Gaillot [mailto:kgaillot at redhat.com] 
Envoyé : Tuesday, September 5, 2023 10:00 PM
À : Adil Bouazzaoui <adilb574 at gmail.com>
Cc : Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>; Adil BOUAZZAOUI <adil.bouazzaoui at tmandis.ma>
Objet : [EXTERNE] Re: [ClusterLabs] Centreon HA Cluster - VIP issue

On Tue, 2023-09-05 at 21:13 +0100, Adil Bouazzaoui wrote:
> Hi Ken,
> 
> thank you a big time for the feedback; much appreciated.
> 
> I suppose we go with a new Scenario 3: Setup 2 Clusters across 
> different DCs connected by booth; so could you please clarify below 
> points to me so i can understand better and start working on the
> architecture:
> 
> 1- in case of separate clusters connected by booth: should each 
> cluster have a quorum device for the Master/slave elections?

Hi,

Only one arbitrator is needed for everything.

Since each cluster in this case has two nodes, Corosync will use the "two_node" configuration to determine quorum. When first starting the cluster, both nodes must come up before quorum is obtained. After then, only one node is required to keep quorum -- which means that fencing is essential to prevent split-brain.

> 2- separate floating IPs at each cluster: please check the attached 
> diagram and let me know if this is exactly what you mean?

Yes, that looks good

> 3- To fail over, you update the DNS to point to the appropriate IP:
> can you suggest any guide to work on so we can have the DNS updated 
> automatically?

Unfortunately I don't know of any. If your DNS provider offers an API of some kind, you can write a resource agent that uses it. If you're running your own DNS servers, the agent has to update the zone files appropriately and reload.

Depending on what your services are, it might be sufficient to use a booth ticket for just the DNS resource, and let everything else stay running all the time. For example it doesn't hurt anything for both sites' floating IPs to stay up.

> Regards
> Adil Bouazzaoui
> 
> Le mar. 5 sept. 2023 à 16:48, Ken Gaillot <kgaillot at redhat.com> a 
> écrit :
> > Hi,
> > 
> > The scenario you describe is still a challenging one for HA.
> > 
> > A single cluster requires low latency and reliable communication. A 
> > cluster within a single data center or spanning data centers on the 
> > same campus can be reliable (and appears to be what Centreon has in 
> > mind), but it sounds like you're looking for geographical 
> > redundancy.
> > 
> > A single cluster isn't appropriate for that. Instead, separate 
> > clusters connected by booth would be preferable. Each cluster would 
> > have its own nodes and fencing. Booth tickets would control which 
> > cluster could run resources.
> > 
> > Whatever design you use, it is pointless to put a quorum tie- 
> > breaker at one of the data centers. If that data center becomes 
> > unreachable, the other one can't recover resources. The tie-breaker 
> > (qdevice for a single cluster or a booth arbitrator for multiple 
> > clusters) can be very lightweight, so it can run in a public cloud 
> > for example, if a third site is not available.
> > 
> > The IP issue is separate. For that, you will need separate floating 
> > IPs at each cluster, on that cluster's network. To fail over, you 
> > update the DNS to point to the appropriate IP. That is a tricky 
> > problem without a universal automated solution. Some people update 
> > the DNS manually after being alerted of a failover. You could write 
> > a custom resource agent to update the DNS automatically. Either way 
> > you'll need low TTLs on the relevant records.
> > 
> > On Sun, 2023-09-03 at 11:59 +0000, Adil BOUAZZAOUI wrote:
> > > Hello,
> > >  
> > > My name is Adil, I’m working for Tman company, we are testing the 
> > > Centreon HA cluster to monitor our infrastructure for 13
> > companies,
> > > for now we are using the 100 IT license to test the platform,
> > once
> > > everything is working fine then we can purchase a license
> > suitable
> > > for our case.
> > >  
> > > We're stuck at scenario 2: setting up Centreon HA Cluster with
> > Master
> > > & Slave on a different datacenters.
> > > For scenario 1: setting up the Cluster with Master & Slave and
> > VIP
> > > address on the same network (VLAN) it is working fine.
> > >  
> > > Scenario 1: Cluster on Same network (same DC) ==> works fine 
> > > Master in DC 1 VLAN 1: 172.30.9.230 /24 Slave in DC 1 VLAN 1: 
> > > 172.30.9.231 /24 VIP in DC 1 VLAN 1: 172.30.9.240/24 Quorum in DC 
> > > 1 LAN: 192.168.253.230/24 Poller in DC 1 LAN: 192.168.253.231/24
> > >  
> > > Scenario 2: Cluster on different networks (2 separate DCs
> > connected
> > > with VPN) ==> still not working
> > > Master in DC 1 VLAN 1: 172.30.9.230 /24 Slave in DC 2 VLAN 2: 
> > > 172.30.10.230 /24
> > > VIP: example 102.84.30.XXX. We used a public static IP from our 
> > > internet service provider, we thought that using a IP from a site 
> > > network won't work, if the site goes down then the VIP won't be 
> > > reachable!
> > > Quorum: 192.168.253.230/24
> > > Poller: 192.168.253.231/24
> > >  
> > >  
> > > Our goal is to have Master & Slave nodes on different sites, so
> > when
> > > Site A goes down, we keep monitoring with the slave.
> > > The problem is that we don't know how to set up the VIP address?
> > Nor
> > > what kind of VIP address will work? or how can the VIP address
> > work
> > > in this scenario? or is there anything else that can replace the
> > VIP
> > > address to make things work.
> > > Also, can we use a backup poller? so if the poller 1 on Site A
> > goes
> > > down, then the poller 2 on Site B can take the lead?
> > >  
> > > we looked everywhere (The watch, youtube, Reddit, Github...), and
> > we
> > > still couldn't get a workaround!
> > >  
> > > the guide we used to deploy the 2 Nodes Cluster: 
> > > 
> > https://docs.centreon.com/docs/installation/installation-of-centreon
> > -ha/overview/
> > >  
> > > attached the 2 DCs architecture example, and also most of the 
> > > required screenshots/config.
> > >  
> > >  
> > > We appreciate your support.
> > > Thank you in advance.
> > >  
> > >  
> > >  
> > > Regards
> > > Adil Bouazzaoui
> > >  
> > >        Adil BOUAZZAOUI Ingénieur Infrastructures & Technologies
> >      
> > >  GSM         : +212 703 165 758 E-mail  : 
> > adil.bouazzaoui at tmandis.ma
> > >  
> > >  
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>