[ClusterLabs] Antw: Re: Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

Wed Jun 16 03:11:18 EDT 2021

>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 16.06.2021 um 09:03 in
Nachricht
<CAA91j0WwAqt+ZJ-ny5RamkeCDPbVFy+3qCzmFrbqwKWZhiY1pw at mail.gmail.com>:
> On Wed, Jun 16, 2021 at 9:05 AM Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>
>> >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 15.06.2021 um 17:20
in
>> Nachricht
>> <CAA91j0XaGFRrYvum=Do3qoPFe5YUj9s_4VoEHcAH72QAHyGBew at mail.gmail.com>:
>> > We had the following situation
>> >
>> > 2‑node cluster with single device (just single external storage
>> > available). Storage failed. So SBD lost access to the device. Cluster
>> > was still up, both nodes were running.
>>
>> Shouldn't sbd fence then (after some delay)?
>>
> 
> No. That is what pacemaker integration is for.
> 
>> >
>> > We thought that access to storage was restored, but one step was
>> > missing so devices appeared empty.
>> >
>> > At this point I tried to restart the pacemaker. But as soon as I
>> > stopped pacemaker SBD rebooted nodes ‑ which is logical, as quorum was
>> > now lost.
>> >
>> > How to cleanly stop pacemaker in this case and keep nodes up?
>>
>> Unconfigurte sbd devices I guess.
>>
> 
> Do you have *practical* suggestions on how to do it online in a
> running pacemaker cluster? Can you explain how it is going to help
> given that lack of sbd device was not the problem in the first place?

My guess was sdb timed out waiting for your failed devices. As you didn't
provide more detials it's mostly guesswork.
You can only change the SBD device while the node is down, but you can do it
node-by-node.

Regards,
Ulrich

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/