[ClusterLabs] [EXTERNE] Re: Users Digest, Vol 104, Issue 5

Adil BOUAZZAOUI adil.bouazzaoui at tmandis.ma
Fri Sep 8 04:08:08 EDT 2023


Hi Jan,

Any update  please ?


Regards
Adil Bouazzaoui

[cid:image003.png at 01D8A2AB.F7B7B9F0]
Adil BOUAZZAOUI
Ingénieur Infrastructures & Technologies
GSM         : +212 703 165 758
E-mail  : adil.bouazzaoui at tmandis.ma<mailto:adil.bouazzaoui at tmandis.ma>


De : Adil BOUAZZAOUI
Envoyé : Tuesday, September 5, 2023 9:03 AM
À : Klaus Wenninger <kwenning at redhat.com>; Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Cc : jfriesse at redhat.com
Objet : RE: [EXTERNE] Re: [ClusterLabs] Users Digest, Vol 104, Issue 5

Hi Jan,

This is the correct reply:

to add more information, we deployed Centreon 2 Node HA Cluster (Master in DC 1 & Slave in DC 2), quorum device which is responsible for split-brain is on DC 1 too, and the poller which is responsible for monitoring is i DC 1 too. The problem is that a VIP address is required (attached to Master node, in case of failover it will be moved to Slave) and we don't know what VIP we should use? also we don't know what is the perfect setup for our current scenario so if DC 1 goes down then the Slave on DC 2 will be the Master, that's why we don't know where to place the Quorum device and the poller?

i hope to get some ideas so we can setup this cluster correctly.
thanks in advance.



Regards
Adil Bouazzaoui

[cid:image003.png at 01D8A2AB.F7B7B9F0]
Adil BOUAZZAOUI
Ingénieur Infrastructures & Technologies
GSM         : +212 703 165 758
E-mail  : adil.bouazzaoui at tmandis.ma<mailto:adil.bouazzaoui at tmandis.ma>


De : Klaus Wenninger [mailto:kwenning at redhat.com]
Envoyé : Tuesday, September 5, 2023 7:28 AM
À : Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Cc : jfriesse at redhat.com<mailto:jfriesse at redhat.com>; Adil BOUAZZAOUI <adil.bouazzaoui at tmandis.ma<mailto:adil.bouazzaoui at tmandis.ma>>
Objet : [EXTERNE] Re: [ClusterLabs] Users Digest, Vol 104, Issue 5

Down below you replied to 2 threads. I think the latter is the one you intended to ... very confusing ...
Sry for adding more spam - was hesitant - but I think there is a chance it removes some confusion ...

Klaus

On Mon, Sep 4, 2023 at 10:29 PM Adil Bouazzaoui <adilb574 at gmail.com<mailto:adilb574 at gmail.com>> wrote:
Hi Jan,

to add more information, we deployed Centreon 2 Node HA Cluster (Master in DC 1 & Slave in DC 2), quorum device which is responsible for split-brain is on DC 1 too, and the poller which is responsible for monitoring is i DC 1 too. The problem is that a VIP address is required (attached to Master node, in case of failover it will be moved to Slave) and we don't know what VIP we should use? also we don't know what is the perfect setup for our current scenario so if DC 1 goes down then the Slave on DC 2 will be the Master, that's why we don't know where to place the Quorum device and the poller?

i hope to get some ideas so we can setup this cluster correctly.
thanks in advance.

Adil Bouazzaoui
IT Infrastructure engineer
adil.bouazzaoui at tmandis.ma<mailto:adil.bouazzaoui at tmandis.ma>
adilb574 at gmail.com<mailto:adilb574 at gmail.com>

Le lun. 4 sept. 2023 à 15:24, <users-request at clusterlabs.org<mailto:users-request at clusterlabs.org>> a écrit :
Send Users mailing list submissions to
        users at clusterlabs.org<mailto:users at clusterlabs.org>

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.clusterlabs.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
        users-request at clusterlabs.org<mailto:users-request at clusterlabs.org>

You can reach the person managing the list at
        users-owner at clusterlabs.org<mailto:users-owner at clusterlabs.org>

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Users digest..."


Today's Topics:

   1. Re: issue during Pacemaker failover testing (Klaus Wenninger)
   2. Re: issue during Pacemaker failover testing (Klaus Wenninger)
   3. Re: issue during Pacemaker failover testing (David Dolan)
   4. Re: Centreon HA Cluster - VIP issue (Jan Friesse)


----------------------------------------------------------------------

Message: 1
Date: Mon, 4 Sep 2023 14:15:52 +0200
From: Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>>
To: Cluster Labs - All topics related to open-source clustering
        welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Cc: David Dolan <daithidolan at gmail.com<mailto:daithidolan at gmail.com>>
Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
Message-ID:
        <CALrDAo0XqSRZ69LRArOPrLOOxwmCy1UuwqFPXsQzSC=WODyhTQ at mail.gmail.com<mailto:WODyhTQ at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"

On Mon, Sep 4, 2023 at 1:44?PM Andrei Borzenkov <arvidjaar at gmail.com<mailto:arvidjaar at gmail.com>> wrote:

> On Mon, Sep 4, 2023 at 2:25?PM Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>>
> wrote:
> >
> >
> > Or go for qdevice with LMS where I would expect it to be able to really
> go down to
> > a single node left - any of the 2 last ones - as there is still qdevice.#
> > Sry for the confusion btw.
> >
>
> According to documentation, "LMS is also incompatible with quorum
> devices, if last_man_standing is specified in corosync.conf then the
> quorum device will be disabled".
>

That is why I said qdevice with LMS - but it was probably not explicit
enough without telling that I meant the qdevice algorithm and not
the corosync flag.

Klaus

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230904/23e22260/attachment-0001.htm>

------------------------------

Message: 2
Date: Mon, 4 Sep 2023 14:32:39 +0200
From: Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>>
To: Cluster Labs - All topics related to open-source clustering
        welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Cc: David Dolan <daithidolan at gmail.com<mailto:daithidolan at gmail.com>>
Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
Message-ID:
        <CALrDAo0V8BXp4AjWCobKeAE6PimvGG2xME6iA+OHxSHEsX90Ag at mail.gmail.com<mailto:CALrDAo0V8BXp4AjWCobKeAE6PimvGG2xME6iA%2BOHxSHEsX90Ag at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"

On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidjaar at gmail.com<mailto:arvidjaar at gmail.com>> wrote:

> On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>>
> wrote:
> >
> >
> >
> > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithidolan at gmail.com<mailto:daithidolan at gmail.com>>
> wrote:
> >>
> >> Hi Klaus,
> >>
> >> With default quorum options I've performed the following on my 3 node
> cluster
> >>
> >> Bring down cluster services on one node - the running services migrate
> to another node
> >> Wait 3 minutes
> >> Bring down cluster services on one of the two remaining nodes - the
> surviving node in the cluster is then fenced
> >>
> >> Instead of the surviving node being fenced, I hoped that the services
> would migrate and run on that remaining node.
> >>
> >> Just looking for confirmation that my understanding is ok and if I'm
> missing something?
> >
> >
> > As said I've never used it ...
> > Well when down to 2 nodes LMS per definition is getting into trouble as
> after another
> > outage any of them is gonna be alone. In case of an ordered shutdown
> this could
> > possibly be circumvented though. So I guess your fist attempt to enable
> auto-tie-breaker
> > was the right idea. Like this you will have further service at least on
> one of the nodes.
> > So I guess what you were seeing is the right - and unfortunately only
> possible - behavior.
>
> I still do not see where fencing comes from. Pacemaker requests
> fencing of the missing nodes. It also may request self-fencing, but
> not in the default settings. It is rather hard to tell what happens
> without logs from the last remaining node.
>
> That said, the default action is to stop all resources, so the end
> result is not very different :)
>

But you are of course right. The expected behaviour would be that
the leftover node stops the resources.
But maybe we're missing something here. Hard to tell without
the exact configuration including fencing.
Again, as already said, I don't know anything about the LMS
implementation with corosync. In theory there were both arguments
to either suicide (but that would have to be done by pacemaker) or
to automatically switch to some 2-node-mode once the remaining
partition is reduced to just 2 followed by a fence-race (when done
without the precautions otherwise used for 2-node-clusters).
But I guess in this case it is none of those 2.

Klaus

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230904/eec03b22/attachment-0001.htm>

------------------------------

Message: 3
Date: Mon, 4 Sep 2023 14:44:25 +0100
From: David Dolan <daithidolan at gmail.com<mailto:daithidolan at gmail.com>>
To: Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>>, arvidjaar at gmail.com<mailto:arvidjaar at gmail.com>
Cc: Cluster Labs - All topics related to open-source clustering
        welcomed <users at clusterlabs.org<mailto:users at clusterlabs.org>>
Subject: Re: [ClusterLabs] issue during Pacemaker failover testing
Message-ID:
        <CAH1k77CSK64=BgMnYqJo6B4Gbbo2Q06Jhnp9xk2tCebraHvhbg at mail.gmail.com<mailto:BgMnYqJo6B4Gbbo2Q06Jhnp9xk2tCebraHvhbg at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"

Thanks Klaus\Andrei,

So if I understand correctly what I'm trying probably shouldn't work.
And I should attempt setting auto_tie_breaker in corosync and remove
last_man_standing.
Then, I should set up another server with qdevice and configure that using
the LMS algorithm.

Thanks
David

On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>> wrote:

>
>
> On Mon, Sep 4, 2023 at 1:50?PM Andrei Borzenkov <arvidjaar at gmail.com<mailto:arvidjaar at gmail.com>>
> wrote:
>
>> On Mon, Sep 4, 2023 at 2:18?PM Klaus Wenninger <kwenning at redhat.com<mailto:kwenning at redhat.com>>
>> wrote:
>> >
>> >
>> >
>> > On Mon, Sep 4, 2023 at 12:45?PM David Dolan <daithidolan at gmail.com<mailto:daithidolan at gmail.com>>
>> wrote:
>> >>
>> >> Hi Klaus,
>> >>
>> >> With default quorum options I've performed the following on my 3 node
>> cluster
>> >>
>> >> Bring down cluster services on one node - the running services migrate
>> to another node
>> >> Wait 3 minutes
>> >> Bring down cluster services on one of the two remaining nodes - the
>> surviving node in the cluster is then fenced
>> >>
>> >> Instead of the surviving node being fenced, I hoped that the services
>> would migrate and run on that remaining node.
>> >>
>> >> Just looking for confirmation that my understanding is ok and if I'm
>> missing something?
>> >
>> >
>> > As said I've never used it ...
>> > Well when down to 2 nodes LMS per definition is getting into trouble as
>> after another
>> > outage any of them is gonna be alone. In case of an ordered shutdown
>> this could
>> > possibly be circumvented though. So I guess your fist attempt to enable
>> auto-tie-breaker
>> > was the right idea. Like this you will have further service at least on
>> one of the nodes.
>> > So I guess what you were seeing is the right - and unfortunately only
>> possible - behavior.
>>
>> I still do not see where fencing comes from. Pacemaker requests
>> fencing of the missing nodes. It also may request self-fencing, but
>> not in the default settings. It is rather hard to tell what happens
>> without logs from the last remaining node.
>>
>> That said, the default action is to stop all resources, so the end
>> result is not very different :)
>>
>
> But you are of course right. The expected behaviour would be that
> the leftover node stops the resources.
> But maybe we're missing something here. Hard to tell without
> the exact configuration including fencing.
> Again, as already said, I don't know anything about the LMS
> implementation with corosync. In theory there were both arguments
> to either suicide (but that would have to be done by pacemaker) or
> to automatically switch to some 2-node-mode once the remaining
> partition is reduced to just 2 followed by a fence-race (when done
> without the precautions otherwise used for 2-node-clusters).
> But I guess in this case it is none of those 2.
>
> Klaus
>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230904/dbb61369/attachment-0001.htm>

------------------------------

Message: 4
Date: Mon, 4 Sep 2023 16:23:40 +0200
From: Jan Friesse <jfriesse at redhat.com<mailto:jfriesse at redhat.com>>
To: users at clusterlabs.org<mailto:users at clusterlabs.org>
Subject: Re: [ClusterLabs] Centreon HA Cluster - VIP issue
Message-ID: <cd344f85-a161-2fe1-9f4e-61d7497d208c at redhat.com<mailto:cd344f85-a161-2fe1-9f4e-61d7497d208c at redhat.com>>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi,


On 02/09/2023 17:16, Adil Bouazzaoui wrote:
>   Hello,
>
> My name is Adil,i worked for Tman company, we are testing the Centreon HA
> cluster to monitor our infrastructure for 13 companies, for now we are
> using the 100 IT licence to test the platform, once everything is working
> fine then we can purchase a licence suitable for our case.
>
> We're stuck at *scenario 2*: setting up Centreon HA Cluster with Master &
> Slave on a different datacenters.
> For *scenario 1*: setting up the Cluster with Master & Slave and VIP
> address on the same network (VLAN) it is working fine.
>
> *Scenario 1: Cluster on Same network (same DC) ==> works fine*
> Master in DC 1 VLAN 1: 172.30.15.10 /24
> Slave in DC 1 VLAN 1: 172.30.15.20 /24
> VIP in DC 1 VLAN 1: 172.30.15.30/24<http://172.30.15.30/24>
> Quorum in DC 1 LAN: 192.168.1.10/24<http://192.168.1.10/24>
> Poller in DC 1 LAN: 192.168.1.20/24<http://192.168.1.20/24>
>
> *Scenario 2: Cluster on different networks (2 separate DCs connected with
> VPN) ==> still not working*

corosync on all nodes needs to have direct connection to any other node.
VPN should work as long as routing is correctly configured. What exactly
is "still not working"?

> Master in DC 1 VLAN 1: 172.30.15.10 /24
> Slave in DC 2 VLAN 2: 172.30.50.10 /24
> VIP: example 102.84.30.XXX. We used a public static IP from our internet
> service provider, we thought that using a IP from a site network won't
> work, if the site goes down then the VIP won't be reachable!
> Quorum: 192.168.1.10/24<http://192.168.1.10/24>

No clue what you mean by Quorum, but placing it in DC1 doesn't feel right.

> Poller: 192.168.1.20/24<http://192.168.1.20/24>
>
> Our *goal *is to have Master & Slave nodes on different sites, so when Site
> A goes down, we keep monitoring with the slave.
> The problem is that we don't know how to set up the VIP address? Nor what
> kind of VIP address will work? or how can the VIP address work in this
> scenario? or is there anything else that can replace the VIP address to
> make things work.
> Also, can we use a backup poller? so if the poller 1 on Site A goes down,
> then the poller 2 on Site B can take the lead?
>
> we looked everywhere (The watch, youtube, Reddit, Github...), and we still
> couldn't get a workaround!
>
> the guide we used to deploy the 2 Nodes Cluster:
> https://docs.centreon.com/docs/installation/installation-of-centreon-ha/overview/
>
> attached the 2 DCs architecture example.
>
> We appreciate your support.
> Thank you in advance.
>
>
> Adil Bouazzaoui
> IT Infrastructure Engineer
> TMAN
> adil.bouazzaoui at tmandis.ma<mailto:adil.bouazzaoui at tmandis.ma>
> adilb574 at gmail.com<mailto:adilb574 at gmail.com>
> +212 656 29 2020
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>



------------------------------

Subject: Digest Footer

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


------------------------------

End of Users Digest, Vol 104, Issue 5
*************************************


--



Adil Bouazzaoui
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230908/a2a53ffe/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 10863 bytes
Desc: image001.png
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230908/a2a53ffe/attachment-0001.png>


More information about the Users mailing list