[ClusterLabs] [External] : Re: Fence Agent tests

Klaus Wenninger kwenning at redhat.com
Tue Nov 15 07:17:36 EST 2022


On Sat, Nov 5, 2022 at 9:45 PM Jehan-Guillaume de Rorthais via Users <
users at clusterlabs.org> wrote:

> On Sat, 5 Nov 2022 20:53:09 +0100
> Valentin Vidić via Users <users at clusterlabs.org> wrote:
>
> > On Sat, Nov 05, 2022 at 06:47:59PM +0000, Robert Hayden wrote:
> > > That was my impression as well...so I may have something wrong.  My
> > > expectation was that SBD daemon should be writing to the /dev/watchdog
> > > within 20 seconds and the kernel watchdog would self fence.
> >
> > I don't see anything unusual in the config except that pacemaker mode is
> > also enabled. This means that the cluster is providing signal for sbd
> even
> > when the storage device is down, for example:
> >
> > 883 ?        SL     0:00 sbd: inquisitor
> > 892 ?        SL     0:00  \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid:
> ...
> > 893 ?        SL     0:00  \_ sbd: watcher: Pacemaker
> > 894 ?        SL     0:00  \_ sbd: watcher: Cluster
> >
> > You can strace different sbd processes to see what they are doing at any
> > point.
>
> I suspect both watchers should detect the loss of network/communication
> with
> the other node.
>
> BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the
> local **Pacemaker** is still quorate (via corosync). See the full chapter:
> «If Pacemaker integration is activated, SBD will not self-fence if
> **device**
> majority is lost [...]»
>
> https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-storage-protect.html
>
> Would it be possible that no node is shutting down because the cluster is
> in
> two-node mode? Because of this mode, both would keep the quorum expecting
> the
> fencing to kill the other one... Except there's no active fencing here,
> only
> "self-fencing".
>

Seems not to be the case here but for completeness:
This fact should be recognized automatically by sbd (upstream since some
time
in 2017 iirc) and instead of checking quorum sbd would then check for
presence of 2 nodes with the cpg-group. I hope corosync prevents 2-node &
qdevice
set at the same time. But even in that case I would rather expect unexpected
self-fencing instead of the opposite.

Klaus


>
> To verify this guess, check the corosync conf for the "two_node" parameter
> and
> if both nodes still report as quorate during network outage using:
>
>   corosync-quorumtool -s
>
> If this turn to be a good guess, without **active** fencing, I suppose a
> cluster
> can not rely on the two-node mode. I'm not sure what would be the best
> setup
> though.
>
> Regards,
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20221115/1efc88a7/attachment-0001.htm>


More information about the Users mailing list