[ClusterLabs] issue during Pacemaker failover testing

Klaus Wenninger kwenning at redhat.com
Mon Sep 4 08:32:39 EDT 2023


On Mon, Sep 4, 2023 at 1:50 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger <kwenning at redhat.com>
> wrote:
> >
> >
> >
> > On Mon, Sep 4, 2023 at 12:45 PM David Dolan <daithidolan at gmail.com>
> wrote:
> >>
> >> Hi Klaus,
> >>
> >> With default quorum options I've performed the following on my 3 node
> cluster
> >>
> >> Bring down cluster services on one node - the running services migrate
> to another node
> >> Wait 3 minutes
> >> Bring down cluster services on one of the two remaining nodes - the
> surviving node in the cluster is then fenced
> >>
> >> Instead of the surviving node being fenced, I hoped that the services
> would migrate and run on that remaining node.
> >>
> >> Just looking for confirmation that my understanding is ok and if I'm
> missing something?
> >
> >
> > As said I've never used it ...
> > Well when down to 2 nodes LMS per definition is getting into trouble as
> after another
> > outage any of them is gonna be alone. In case of an ordered shutdown
> this could
> > possibly be circumvented though. So I guess your fist attempt to enable
> auto-tie-breaker
> > was the right idea. Like this you will have further service at least on
> one of the nodes.
> > So I guess what you were seeing is the right - and unfortunately only
> possible - behavior.
>
> I still do not see where fencing comes from. Pacemaker requests
> fencing of the missing nodes. It also may request self-fencing, but
> not in the default settings. It is rather hard to tell what happens
> without logs from the last remaining node.
>
> That said, the default action is to stop all resources, so the end
> result is not very different :)
>

But you are of course right. The expected behaviour would be that
the leftover node stops the resources.
But maybe we're missing something here. Hard to tell without
the exact configuration including fencing.
Again, as already said, I don't know anything about the LMS
implementation with corosync. In theory there were both arguments
to either suicide (but that would have to be done by pacemaker) or
to automatically switch to some 2-node-mode once the remaining
partition is reduced to just 2 followed by a fence-race (when done
without the precautions otherwise used for 2-node-clusters).
But I guess in this case it is none of those 2.

Klaus

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230904/eec03b22/attachment.htm>


More information about the Users mailing list