[ClusterLabs] Correctly stop pacemaker on 2-node cluster with SBD and failed devices?
kwenning at redhat.com
Wed Jun 16 02:14:44 EDT 2021
On Tue, Jun 15, 2021 at 10:41 PM Strahil Nikolov <hunter86_bg at yahoo.com>
> Maybe you can try:
> while true ; do echo '0' > /proc/sys/kernel/nmi_watchdog ; sleep 1 ; done
> and in another shell stop pacemaker and sbd.
> I guess the only way to easily reproduce is with sbd over iscsi.
> Best Regards,
> Strahil Nikolov
> On Tue, Jun 15, 2021 at 21:30, Andrei Borzenkov
> <arvidjaar at gmail.com> wrote:
> On 15.06.2021 20:48, Strahil Nikolov wrote:
> > I'm using 'pcs cluster stop' (or it's crm alternative),yet I'm not sure
> if it will help in this case.
> No it won't. It will still stop pacemaker.
> Guess this is really a delicate issue and we might think of adding
some handle here. Although of course these kind of handles always
come with a certain amount of risk that they might be used in a
way that prevents a node from suiciding when it actually should.
Unfortunately the way 'pcs cluster stop' avoids suicides of single
nodes in larger clusters might not work here - first stop pacemaker
on all nodes and just then stop corosync to keep quorum for long enough
and to have a quick shutdown of the rest - as on a 2-node-cluster
sbd actually isn't checking for quorum but for the number of nodes
registered with the corosync protocol pacemaker uses.
> > Most probably the safest way is to wait for the storage to be recovered,
> as without the pacemaker<->SBD communication , sbd will stop and the
> watchdog will be triggered.
> What makes you think I am not aware of it?
> can you suggest the steps to avoid it?
> Manage your subscription:
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users