[ClusterLabs] Users Digest, Vol 104, Issue 5

Klaus Wenninger kwenning at redhat.com
Tue Sep 5 02:28:28 EDT 2023


Down below you replied to 2 threads. I think the latter is the one you
intended to ... very confusing ...
Sry for adding more spam - was hesitant - but I think there is a chance it
removes some confusion ...

Klaus

On Mon, Sep 4, 2023 at 10:29 PM Adil Bouazzaoui <adilb574 at gmail.com> wrote:

> Hi Jan,
>
> to add more information, we deployed Centreon 2 Node HA Cluster (Master in
> DC 1 & Slave in DC 2), quorum device which is responsible for split-brain
> is on DC 1 too, and the poller which is responsible for monitoring is i DC
> 1 too. The problem is that a VIP address is required (attached to Master
> node, in case of failover it will be moved to Slave) and we don't know what
> VIP we should use? also we don't know what is the perfect setup for our
> current scenario so if DC 1 goes down then the Slave on DC 2 will be the
> Master, that's why we don't know where to place the Quorum device and the
> poller?
>
> i hope to get some ideas so we can setup this cluster correctly.
> thanks in advance.
>
> Adil Bouazzaoui
> IT Infrastructure engineer
> adil.bouazzaoui at tmandis.ma
> adilb574 at gmail.com
>
> Le lun. 4 sept. 2023 à 15:24, <users-request at clusterlabs.org> a écrit :
>
>> Send Users mailing list submissions to
>>         users at clusterlabs.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://lists.clusterlabs.org/mailman/listinfo/users
>> or, via email, send a message with subject or body 'help' to
>>         users-request at clusterlabs.org
>>
>> You can reach the person managing the list at
>>         users-owner at clusterlabs.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Users digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: issue during Pacemaker failover testing (Klaus Wenninger)
>>    2. Re: issue during Pacemaker failover testing (Klaus Wenninger)
>>    3. Re: issue during Pacemaker failover testing (David Dolan)
>>    4. Re: Centreon HA Cluster - VIP issue (Jan Friesse)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Mon, 4 Sep 2023 14:15:52 +0200
>> From: Klaus Wenninger <kwenning at redhat.com>
>> To: Cluster Labs - All topics related to open-source clustering
>>         welcomed <users at clusterlabs.org>
>> Cc: David Dolan <daithidolan at gmail.com>
>> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
>> Message-ID:
>>         <CALrDAo0XqSRZ69LRArOPrLOOxwmCy1UuwqFPXsQzSC=
>> WODyhTQ at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> On Mon, Sep 4, 2023 at 1:44?PM Andrei Borzenkov <arvidjaar at gmail.com>
>> wrote:
>>
>> > On Mon, Sep 4, 2023 at 2:25?PM Klaus Wenninger <kwenning at redhat.com>
>> > wrote:
>> > >
>> > >
>> > > Or go for qdevice with LMS where I would expect it to be able to
>> really
>> > go down to
>> > > a single node left - any of the 2 last ones - as there is still
>> qdevice.#
>> > > Sry for the confusion btw.
>> > >
>> >
>> > According to documentation, "LMS is also incompatible with quorum
>> > devices, if last_man_standing is specified in corosync.conf then the
>> > quorum device will be disabled".
>> >
>>
>> That is why I said qdevice with LMS - but it was probably not explicit
>> enough without telling that I meant the qdevice algorithm and not
>> the corosync flag.
>>
>> Klaus
>>
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/23e22260/attachment-0001.htm
>> >
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 4 Sep 2023 14:32:39 +0200
>> From: Klaus Wenninger <kwenning at redhat.com>
>> To: Cluster Labs - All topics related to open-source clustering
>>         welcomed <users at clusterlabs.org>
>> Cc: David Dolan <daithidolan at gmail.com>
>> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
>> Message-ID:
>>         <
>> CALrDAo0V8BXp4AjWCobKeAE6PimvGG2xME6iA+OHxSHEsX90Ag at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidjaar at gmail.com>
>> wrote:
>>
>> > On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenning at redhat.com>
>> > wrote:
>> > >
>> > >
>> > >
>> > > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithidolan at gmail.com>
>> > wrote:
>> > >>
>> > >> Hi Klaus,
>> > >>
>> > >> With default quorum options I've performed the following on my 3 node
>> > cluster
>> > >>
>> > >> Bring down cluster services on one node - the running services
>> migrate
>> > to another node
>> > >> Wait 3 minutes
>> > >> Bring down cluster services on one of the two remaining nodes - the
>> > surviving node in the cluster is then fenced
>> > >>
>> > >> Instead of the surviving node being fenced, I hoped that the services
>> > would migrate and run on that remaining node.
>> > >>
>> > >> Just looking for confirmation that my understanding is ok and if I'm
>> > missing something?
>> > >
>> > >
>> > > As said I've never used it ...
>> > > Well when down to 2 nodes LMS per definition is getting into trouble
>> as
>> > after another
>> > > outage any of them is gonna be alone. In case of an ordered shutdown
>> > this could
>> > > possibly be circumvented though. So I guess your fist attempt to
>> enable
>> > auto-tie-breaker
>> > > was the right idea. Like this you will have further service at least
>> on
>> > one of the nodes.
>> > > So I guess what you were seeing is the right - and unfortunately only
>> > possible - behavior.
>> >
>> > I still do not see where fencing comes from. Pacemaker requests
>> > fencing of the missing nodes. It also may request self-fencing, but
>> > not in the default settings. It is rather hard to tell what happens
>> > without logs from the last remaining node.
>> >
>> > That said, the default action is to stop all resources, so the end
>> > result is not very different :)
>> >
>>
>> But you are of course right. The expected behaviour would be that
>> the leftover node stops the resources.
>> But maybe we're missing something here. Hard to tell without
>> the exact configuration including fencing.
>> Again, as already said, I don't know anything about the LMS
>> implementation with corosync. In theory there were both arguments
>> to either suicide (but that would have to be done by pacemaker) or
>> to automatically switch to some 2-node-mode once the remaining
>> partition is reduced to just 2 followed by a fence-race (when done
>> without the precautions otherwise used for 2-node-clusters).
>> But I guess in this case it is none of those 2.
>>
>> Klaus
>>
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/eec03b22/attachment-0001.htm
>> >
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Mon, 4 Sep 2023 14:44:25 +0100
>> From: David Dolan <daithidolan at gmail.com>
>> To: Klaus Wenninger <kwenning at redhat.com>, arvidjaar at gmail.com
>> Cc: Cluster Labs - All topics related to open-source clustering
>>         welcomed <users at clusterlabs.org>
>> Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
>> Message-ID:
>>         <CAH1k77CSK64=
>> BgMnYqJo6B4Gbbo2Q06Jhnp9xk2tCebraHvhbg at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Thanks Klaus\Andrei,
>>
>> So if I understand correctly what I'm trying probably shouldn't work.
>> And I should attempt setting auto_tie_breaker in corosync and remove
>> last_man_standing.
>> Then, I should set up another server with qdevice and configure that using
>> the LMS algorithm.
>>
>> Thanks
>> David
>>
>> On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger <kwenning at redhat.com>
>> wrote:
>>
>> >
>> >
>> > On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidjaar at gmail.com>
>> > wrote:
>> >
>> >> On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenning at redhat.com>
>> >> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithidolan at gmail.com>
>> >> wrote:
>> >> >>
>> >> >> Hi Klaus,
>> >> >>
>> >> >> With default quorum options I've performed the following on my 3
>> node
>> >> cluster
>> >> >>
>> >> >> Bring down cluster services on one node - the running services
>> migrate
>> >> to another node
>> >> >> Wait 3 minutes
>> >> >> Bring down cluster services on one of the two remaining nodes - the
>> >> surviving node in the cluster is then fenced
>> >> >>
>> >> >> Instead of the surviving node being fenced, I hoped that the
>> services
>> >> would migrate and run on that remaining node.
>> >> >>
>> >> >> Just looking for confirmation that my understanding is ok and if I'm
>> >> missing something?
>> >> >
>> >> >
>> >> > As said I've never used it ...
>> >> > Well when down to 2 nodes LMS per definition is getting into trouble
>> as
>> >> after another
>> >> > outage any of them is gonna be alone. In case of an ordered shutdown
>> >> this could
>> >> > possibly be circumvented though. So I guess your fist attempt to
>> enable
>> >> auto-tie-breaker
>> >> > was the right idea. Like this you will have further service at least
>> on
>> >> one of the nodes.
>> >> > So I guess what you were seeing is the right - and unfortunately only
>> >> possible - behavior.
>> >>
>> >> I still do not see where fencing comes from. Pacemaker requests
>> >> fencing of the missing nodes. It also may request self-fencing, but
>> >> not in the default settings. It is rather hard to tell what happens
>> >> without logs from the last remaining node.
>> >>
>> >> That said, the default action is to stop all resources, so the end
>> >> result is not very different :)
>> >>
>> >
>> > But you are of course right. The expected behaviour would be that
>> > the leftover node stops the resources.
>> > But maybe we're missing something here. Hard to tell without
>> > the exact configuration including fencing.
>> > Again, as already said, I don't know anything about the LMS
>> > implementation with corosync. In theory there were both arguments
>> > to either suicide (but that would have to be done by pacemaker) or
>> > to automatically switch to some 2-node-mode once the remaining
>> > partition is reduced to just 2 followed by a fence-race (when done
>> > without the precautions otherwise used for 2-node-clusters).
>> > But I guess in this case it is none of those 2.
>> >
>> > Klaus
>> >
>> >> _______________________________________________
>> >> Manage your subscription:
>> >> https://lists.clusterlabs.org/mailman/listinfo/users
>> >>
>> >> ClusterLabs home: https://www.clusterlabs.org/
>> >>
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> https://lists.clusterlabs.org/pipermail/users/attachments/20230904/dbb61369/attachment-0001.htm
>> >
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Mon, 4 Sep 2023 16:23:40 +0200
>> From: Jan Friesse <jfriesse at redhat.com>
>> To: users at clusterlabs.org
>> Subject: Re: [ClusterLabs] Centreon HA Cluster - VIP issue
>> Message-ID: <cd344f85-a161-2fe1-9f4e-61d7497d208c at redhat.com>
>> Content-Type: text/plain; charset=utf-8; format=flowed
>>
>> Hi,
>>
>>
>> On 02/09/2023 17:16, Adil Bouazzaoui wrote:
>> >   Hello,
>> >
>> > My name is Adil,i worked for Tman company, we are testing the Centreon
>> HA
>> > cluster to monitor our infrastructure for 13 companies, for now we are
>> > using the 100 IT licence to test the platform, once everything is
>> working
>> > fine then we can purchase a licence suitable for our case.
>> >
>> > We're stuck at *scenario 2*: setting up Centreon HA Cluster with Master
>> &
>> > Slave on a different datacenters.
>> > For *scenario 1*: setting up the Cluster with Master & Slave and VIP
>> > address on the same network (VLAN) it is working fine.
>> >
>> > *Scenario 1: Cluster on Same network (same DC) ==> works fine*
>> > Master in DC 1 VLAN 1: 172.30.15.10 /24
>> > Slave in DC 1 VLAN 1: 172.30.15.20 /24
>> > VIP in DC 1 VLAN 1: 172.30.15.30/24
>> > Quorum in DC 1 LAN: 192.168.1.10/24
>> > Poller in DC 1 LAN: 192.168.1.20/24
>> >
>> > *Scenario 2: Cluster on different networks (2 separate DCs connected
>> with
>> > VPN) ==> still not working*
>>
>> corosync on all nodes needs to have direct connection to any other node.
>> VPN should work as long as routing is correctly configured. What exactly
>> is "still not working"?
>>
>> > Master in DC 1 VLAN 1: 172.30.15.10 /24
>> > Slave in DC 2 VLAN 2: 172.30.50.10 /24
>> > VIP: example 102.84.30.XXX. We used a public static IP from our internet
>> > service provider, we thought that using a IP from a site network won't
>> > work, if the site goes down then the VIP won't be reachable!
>> > Quorum: 192.168.1.10/24
>>
>> No clue what you mean by Quorum, but placing it in DC1 doesn't feel right.
>>
>> > Poller: 192.168.1.20/24
>> >
>> > Our *goal *is to have Master & Slave nodes on different sites, so when
>> Site
>> > A goes down, we keep monitoring with the slave.
>> > The problem is that we don't know how to set up the VIP address? Nor
>> what
>> > kind of VIP address will work? or how can the VIP address work in this
>> > scenario? or is there anything else that can replace the VIP address to
>> > make things work.
>> > Also, can we use a backup poller? so if the poller 1 on Site A goes
>> down,
>> > then the poller 2 on Site B can take the lead?
>> >
>> > we looked everywhere (The watch, youtube, Reddit, Github...), and we
>> still
>> > couldn't get a workaround!
>> >
>> > the guide we used to deploy the 2 Nodes Cluster:
>> >
>> https://docs.centreon.com/docs/installation/installation-of-centreon-ha/overview/
>> >
>> > attached the 2 DCs architecture example.
>> >
>> > We appreciate your support.
>> > Thank you in advance.
>> >
>> >
>> > Adil Bouazzaoui
>> > IT Infrastructure Engineer
>> > TMAN
>> > adil.bouazzaoui at tmandis.ma
>> > adilb574 at gmail.com
>> > +212 656 29 2020
>> >
>> >
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > ClusterLabs home: https://www.clusterlabs.org/
>> >
>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>> ------------------------------
>>
>> End of Users Digest, Vol 104, Issue 5
>> *************************************
>>
>
>
> --
>
>
> *Adil Bouazzaoui*
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230905/55c44c55/attachment-0001.htm>


More information about the Users mailing list