[ClusterLabs] Antw: [EXT] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

Wed Jun 16 03:03:27 EDT 2021

On Wed, Jun 16, 2021 at 9:05 AM Ulrich Windl
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
> >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 15.06.2021 um 17:20 in
> Nachricht
> <CAA91j0XaGFRrYvum=Do3qoPFe5YUj9s_4VoEHcAH72QAHyGBew at mail.gmail.com>:
> > We had the following situation
> >
> > 2‑node cluster with single device (just single external storage
> > available). Storage failed. So SBD lost access to the device. Cluster
> > was still up, both nodes were running.
>
> Shouldn't sbd fence then (after some delay)?
>

No. That is what pacemaker integration is for.

> >
> > We thought that access to storage was restored, but one step was
> > missing so devices appeared empty.
> >
> > At this point I tried to restart the pacemaker. But as soon as I
> > stopped pacemaker SBD rebooted nodes ‑ which is logical, as quorum was
> > now lost.
> >
> > How to cleanly stop pacemaker in this case and keep nodes up?
>
> Unconfigurte sbd devices I guess.
>

Do you have *practical* suggestions on how to do it online in a
running pacemaker cluster? Can you explain how it is going to help
given that lack of sbd device was not the problem in the first place?