[ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?

Klaus Wenninger kwenning at redhat.com
Wed Jun 16 02:14:44 EDT 2021


On Tue, Jun 15, 2021 at 10:41 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> Maybe you can try:
>
> while true ; do echo '0' > /proc/sys/kernel/nmi_watchdog ; sleep 1 ; done
>
> and in another shell stop pacemaker and sbd.
>
> I guess the only way to easily reproduce is with sbd over iscsi.
>
> Best Regards,
> Strahil Nikolov
>
> On Tue, Jun 15, 2021 at 21:30, Andrei Borzenkov
> <arvidjaar at gmail.com> wrote:
> On 15.06.2021 20:48, Strahil Nikolov wrote:
> > I'm using 'pcs cluster stop' (or it's crm alternative),yet I'm not sure
> if it will help in this case.
> >
>
> No it won't. It will still stop pacemaker.
>
> Guess this is really a delicate issue and we might think of adding
some handle here. Although of course these kind of handles always
come with a certain amount of risk that they might be used in a
way that prevents a node from suiciding when it actually should.
Unfortunately the way 'pcs cluster stop' avoids suicides of single
nodes in larger clusters might not work here - first stop pacemaker
on all nodes and just then stop corosync to keep quorum for long enough
and to have a quick shutdown of the rest - as on a 2-node-cluster
sbd actually isn't checking for quorum but for the number of nodes
registered  with the corosync protocol pacemaker uses.

Regards,
Klaus

>
>
> > Most probably the safest way is to wait for the storage to be recovered,
> as without the pacemaker<->SBD communication , sbd will stop and the
> watchdog will be triggered.
>
> >
>
> What makes you think I am not aware of it?
>
> can you suggest the steps to avoid it?
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210616/313f1930/attachment.htm>


More information about the Users mailing list