[ClusterLabs] Users Digest, Vol 104, Issue 5
Adil Bouazzaoui
adilb574 at gmail.com
Mon Sep 4 16:28:47 EDT 2023
Hi Jan,
to add more information, we deployed Centreon 2 Node HA Cluster (Master in
DC 1 & Slave in DC 2), quorum device which is responsible for split-brain
is on DC 1 too, and the poller which is responsible for monitoring is i DC
1 too. The problem is that a VIP address is required (attached to Master
node, in case of failover it will be moved to Slave) and we don't know what
VIP we should use? also we don't know what is the perfect setup for our
current scenario so if DC 1 goes down then the Slave on DC 2 will be the
Master, that's why we don't know where to place the Quorum device and the
poller?
i hope to get some ideas so we can setup this cluster correctly.
thanks in advance.
Adil Bouazzaoui
IT Infrastructure engineer
adil.bouazzaoui at tmandis.ma
adilb574 at gmail.com
Le lun. 4 sept. 2023 à 15:24, <users-request at clusterlabs.org> a écrit :
> Send Users mailing list submissions to
> users at clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-request at clusterlabs.org
>
> You can reach the person managing the list at
> users-owner at clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
> 1. Re: issue during Pacemaker failover testing (Klaus Wenninger)
> 2. Re: issue during Pacemaker failover testing (Klaus Wenninger)
> 3. Re: issue during Pacemaker failover testing (David Dolan)
> 4. Re: Centreon HA Cluster - VIP issue (Jan Friesse)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 4 Sep 2023 14:15:52 +0200
> From: Klaus Wenninger <kwenning at redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> Cc: David Dolan <daithidolan at gmail.com>
> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
> Message-ID:
> <CALrDAo0XqSRZ69LRArOPrLOOxwmCy1UuwqFPXsQzSC=
> WODyhTQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Mon, Sep 4, 2023 at 1:44?PM Andrei Borzenkov <arvidjaar at gmail.com>
> wrote:
>
> > On Mon, Sep 4, 2023 at 2:25?PM Klaus Wenninger <kwenning at redhat.com>
> > wrote:
> > >
> > >
> > > Or go for qdevice with LMS where I would expect it to be able to really
> > go down to
> > > a single node left - any of the 2 last ones - as there is still
> qdevice.#
> > > Sry for the confusion btw.
> > >
> >
> > According to documentation, "LMS is also incompatible with quorum
> > devices, if last_man_standing is specified in corosync.conf then the
> > quorum device will be disabled".
> >
>
> That is why I said qdevice with LMS - but it was probably not explicit
> enough without telling that I meant the qdevice algorithm and not
> the corosync flag.
>
> Klaus
>
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/23e22260/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 4 Sep 2023 14:32:39 +0200
> From: Klaus Wenninger <kwenning at redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> Cc: David Dolan <daithidolan at gmail.com>
> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
> Message-ID:
> <
> CALrDAo0V8BXp4AjWCobKeAE6PimvGG2xME6iA+OHxSHEsX90Ag at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidjaar at gmail.com>
> wrote:
>
> > On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenning at redhat.com>
> > wrote:
> > >
> > >
> > >
> > > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithidolan at gmail.com>
> > wrote:
> > >>
> > >> Hi Klaus,
> > >>
> > >> With default quorum options I've performed the following on my 3 node
> > cluster
> > >>
> > >> Bring down cluster services on one node - the running services migrate
> > to another node
> > >> Wait 3 minutes
> > >> Bring down cluster services on one of the two remaining nodes - the
> > surviving node in the cluster is then fenced
> > >>
> > >> Instead of the surviving node being fenced, I hoped that the services
> > would migrate and run on that remaining node.
> > >>
> > >> Just looking for confirmation that my understanding is ok and if I'm
> > missing something?
> > >
> > >
> > > As said I've never used it ...
> > > Well when down to 2 nodes LMS per definition is getting into trouble as
> > after another
> > > outage any of them is gonna be alone. In case of an ordered shutdown
> > this could
> > > possibly be circumvented though. So I guess your fist attempt to enable
> > auto-tie-breaker
> > > was the right idea. Like this you will have further service at least on
> > one of the nodes.
> > > So I guess what you were seeing is the right - and unfortunately only
> > possible - behavior.
> >
> > I still do not see where fencing comes from. Pacemaker requests
> > fencing of the missing nodes. It also may request self-fencing, but
> > not in the default settings. It is rather hard to tell what happens
> > without logs from the last remaining node.
> >
> > That said, the default action is to stop all resources, so the end
> > result is not very different :)
> >
>
> But you are of course right. The expected behaviour would be that
> the leftover node stops the resources.
> But maybe we're missing something here. Hard to tell without
> the exact configuration including fencing.
> Again, as already said, I don't know anything about the LMS
> implementation with corosync. In theory there were both arguments
> to either suicide (but that would have to be done by pacemaker) or
> to automatically switch to some 2-node-mode once the remaining
> partition is reduced to just 2 followed by a fence-race (when done
> without the precautions otherwise used for 2-node-clusters).
> But I guess in this case it is none of those 2.
>
> Klaus
>
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/eec03b22/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Mon, 4 Sep 2023 14:44:25 +0100
> From: David Dolan <daithidolan at gmail.com>
> To: Klaus Wenninger <kwenning at redhat.com>, arvidjaar at gmail.com
> Cc: Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
> Message-ID:
> <CAH1k77CSK64=
> BgMnYqJo6B4Gbbo2Q06Jhnp9xk2tCebraHvhbg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Thanks Klaus\Andrei,
>
> So if I understand correctly what I'm trying probably shouldn't work.
> And I should attempt setting auto_tie_breaker in corosync and remove
> last_man_standing.
> Then, I should set up another server with qdevice and configure that using
> the LMS algorithm.
>
> Thanks
> David
>
> On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger <kwenning at redhat.com> wrote:
>
> >
> >
> > On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidjaar at gmail.com>
> > wrote:
> >
> >> On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenning at redhat.com>
> >> wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithidolan at gmail.com>
> >> wrote:
> >> >>
> >> >> Hi Klaus,
> >> >>
> >> >> With default quorum options I've performed the following on my 3 node
> >> cluster
> >> >>
> >> >> Bring down cluster services on one node - the running services
> migrate
> >> to another node
> >> >> Wait 3 minutes
> >> >> Bring down cluster services on one of the two remaining nodes - the
> >> surviving node in the cluster is then fenced
> >> >>
> >> >> Instead of the surviving node being fenced, I hoped that the services
> >> would migrate and run on that remaining node.
> >> >>
> >> >> Just looking for confirmation that my understanding is ok and if I'm
> >> missing something?
> >> >
> >> >
> >> > As said I've never used it ...
> >> > Well when down to 2 nodes LMS per definition is getting into trouble
> as
> >> after another
> >> > outage any of them is gonna be alone. In case of an ordered shutdown
> >> this could
> >> > possibly be circumvented though. So I guess your fist attempt to
> enable
> >> auto-tie-breaker
> >> > was the right idea. Like this you will have further service at least
> on
> >> one of the nodes.
> >> > So I guess what you were seeing is the right - and unfortunately only
> >> possible - behavior.
> >>
> >> I still do not see where fencing comes from. Pacemaker requests
> >> fencing of the missing nodes. It also may request self-fencing, but
> >> not in the default settings. It is rather hard to tell what happens
> >> without logs from the last remaining node.
> >>
> >> That said, the default action is to stop all resources, so the end
> >> result is not very different :)
> >>
> >
> > But you are of course right. The expected behaviour would be that
> > the leftover node stops the resources.
> > But maybe we're missing something here. Hard to tell without
> > the exact configuration including fencing.
> > Again, as already said, I don't know anything about the LMS
> > implementation with corosync. In theory there were both arguments
> > to either suicide (but that would have to be done by pacemaker) or
> > to automatically switch to some 2-node-mode once the remaining
> > partition is reduced to just 2 followed by a fence-race (when done
> > without the precautions otherwise used for 2-node-clusters).
> > But I guess in this case it is none of those 2.
> >
> > Klaus
> >
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/dbb61369/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 4
> Date: Mon, 4 Sep 2023 16:23:40 +0200
> From: Jan Friesse <jfriesse at redhat.com>
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Centreon HA Cluster - VIP issue
> Message-ID: <cd344f85-a161-2fe1-9f4e-61d7497d208c at redhat.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hi,
>
>
> On 02/09/2023 17:16, Adil Bouazzaoui wrote:
> > Hello,
> >
> > My name is Adil,i worked for Tman company, we are testing the Centreon HA
> > cluster to monitor our infrastructure for 13 companies, for now we are
> > using the 100 IT licence to test the platform, once everything is working
> > fine then we can purchase a licence suitable for our case.
> >
> > We're stuck at *scenario 2*: setting up Centreon HA Cluster with Master &
> > Slave on a different datacenters.
> > For *scenario 1*: setting up the Cluster with Master & Slave and VIP
> > address on the same network (VLAN) it is working fine.
> >
> > *Scenario 1: Cluster on Same network (same DC) ==> works fine*
> > Master in DC 1 VLAN 1: 172.30.15.10 /24
> > Slave in DC 1 VLAN 1: 172.30.15.20 /24
> > VIP in DC 1 VLAN 1: 172.30.15.30/24
> > Quorum in DC 1 LAN: 192.168.1.10/24
> > Poller in DC 1 LAN: 192.168.1.20/24
> >
> > *Scenario 2: Cluster on different networks (2 separate DCs connected with
> > VPN) ==> still not working*
>
> corosync on all nodes needs to have direct connection to any other node.
> VPN should work as long as routing is correctly configured. What exactly
> is "still not working"?
>
> > Master in DC 1 VLAN 1: 172.30.15.10 /24
> > Slave in DC 2 VLAN 2: 172.30.50.10 /24
> > VIP: example 102.84.30.XXX. We used a public static IP from our internet
> > service provider, we thought that using a IP from a site network won't
> > work, if the site goes down then the VIP won't be reachable!
> > Quorum: 192.168.1.10/24
>
> No clue what you mean by Quorum, but placing it in DC1 doesn't feel right.
>
> > Poller: 192.168.1.20/24
> >
> > Our *goal *is to have Master & Slave nodes on different sites, so when
> Site
> > A goes down, we keep monitoring with the slave.
> > The problem is that we don't know how to set up the VIP address? Nor what
> > kind of VIP address will work? or how can the VIP address work in this
> > scenario? or is there anything else that can replace the VIP address to
> > make things work.
> > Also, can we use a backup poller? so if the poller 1 on Site A goes down,
> > then the poller 2 on Site B can take the lead?
> >
> > we looked everywhere (The watch, youtube, Reddit, Github...), and we
> still
> > couldn't get a workaround!
> >
> > the guide we used to deploy the 2 Nodes Cluster:
> >
> https://docs.centreon.com/docs/installation/installation-of-centreon-ha/overview/
> >
> > attached the 2 DCs architecture example.
> >
> > We appreciate your support.
> > Thank you in advance.
> >
> >
> > Adil Bouazzaoui
> > IT Infrastructure Engineer
> > TMAN
> > adil.bouazzaoui at tmandis.ma
> > adilb574 at gmail.com
> > +212 656 29 2020
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
> ------------------------------
>
> End of Users Digest, Vol 104, Issue 5
> *************************************
>
--
*Adil Bouazzaoui*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230904/34deca59/attachment-0001.htm>
More information about the Users
mailing list