[ClusterLabs] issue during Pacemaker failover testing

David Dolan daithidolan at gmail.com
Mon Sep 4 09:44:25 EDT 2023


Thanks Klaus\Andrei,

So if I understand correctly what I'm trying probably shouldn't work.
And I should attempt setting auto_tie_breaker in corosync and remove
last_man_standing.
Then, I should set up another server with qdevice and configure that using
the LMS algorithm.

Thanks
David

On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger <kwenning at redhat.com> wrote:

>
>
> On Mon, Sep 4, 2023 at 1:50 PM Andrei Borzenkov <arvidjaar at gmail.com>
> wrote:
>
>> On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger <kwenning at redhat.com>
>> wrote:
>> >
>> >
>> >
>> > On Mon, Sep 4, 2023 at 12:45 PM David Dolan <daithidolan at gmail.com>
>> wrote:
>> >>
>> >> Hi Klaus,
>> >>
>> >> With default quorum options I've performed the following on my 3 node
>> cluster
>> >>
>> >> Bring down cluster services on one node - the running services migrate
>> to another node
>> >> Wait 3 minutes
>> >> Bring down cluster services on one of the two remaining nodes - the
>> surviving node in the cluster is then fenced
>> >>
>> >> Instead of the surviving node being fenced, I hoped that the services
>> would migrate and run on that remaining node.
>> >>
>> >> Just looking for confirmation that my understanding is ok and if I'm
>> missing something?
>> >
>> >
>> > As said I've never used it ...
>> > Well when down to 2 nodes LMS per definition is getting into trouble as
>> after another
>> > outage any of them is gonna be alone. In case of an ordered shutdown
>> this could
>> > possibly be circumvented though. So I guess your fist attempt to enable
>> auto-tie-breaker
>> > was the right idea. Like this you will have further service at least on
>> one of the nodes.
>> > So I guess what you were seeing is the right - and unfortunately only
>> possible - behavior.
>>
>> I still do not see where fencing comes from. Pacemaker requests
>> fencing of the missing nodes. It also may request self-fencing, but
>> not in the default settings. It is rather hard to tell what happens
>> without logs from the last remaining node.
>>
>> That said, the default action is to stop all resources, so the end
>> result is not very different :)
>>
>
> But you are of course right. The expected behaviour would be that
> the leftover node stops the resources.
> But maybe we're missing something here. Hard to tell without
> the exact configuration including fencing.
> Again, as already said, I don't know anything about the LMS
> implementation with corosync. In theory there were both arguments
> to either suicide (but that would have to be done by pacemaker) or
> to automatically switch to some 2-node-mode once the remaining
> partition is reduced to just 2 followed by a fence-race (when done
> without the precautions otherwise used for 2-node-clusters).
> But I guess in this case it is none of those 2.
>
> Klaus
>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230904/dbb61369/attachment.htm>


More information about the Users mailing list