[ClusterLabs] [External] : Re: Fence Agent tests

Sun Nov 6 16:08:19 EST 2022

> -----Original Message-----
> From: Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> Sent: Saturday, November 5, 2022 4:18 PM
> To: Robert Hayden <robert.h.hayden at oracle.com>
> Cc: users at clusterlabs.org
> Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> 
> On Sat, 5 Nov 2022 20:54:55 +0000
> Robert Hayden <robert.h.hayden at oracle.com> wrote:
> 
> > > -----Original Message-----
> > > From: Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> > > Sent: Saturday, November 5, 2022 3:45 PM
> > > To: users at clusterlabs.org
> > > Cc: Robert Hayden <robert.h.hayden at oracle.com>
> > > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> > >
> > > On Sat, 5 Nov 2022 20:53:09 +0100
> > > Valentin Vidić via Users <users at clusterlabs.org> wrote:
> > >
> > > > On Sat, Nov 05, 2022 at 06:47:59PM +0000, Robert Hayden wrote:
> > > > > That was my impression as well...so I may have something wrong.  My
> > > > > expectation was that SBD daemon should be writing to the
> > > /dev/watchdog
> > > > > within 20 seconds and the kernel watchdog would self fence.
> > > >
> > > > I don't see anything unusual in the config except that pacemaker mode
> is
> > > > also enabled. This means that the cluster is providing signal for sbd even
> > > > when the storage device is down, for example:
> > > >
> > > > 883 ?        SL     0:00 sbd: inquisitor
> > > > 892 ?        SL     0:00  \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: ...
> > > > 893 ?        SL     0:00  \_ sbd: watcher: Pacemaker
> > > > 894 ?        SL     0:00  \_ sbd: watcher: Cluster
> > > >
> > > > You can strace different sbd processes to see what they are doing at
> any
> > > > point.
> > >
> > > I suspect both watchers should detect the loss of
> network/communication
> > > with
> > > the other node.
> > >
> > > BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the
> > > local **Pacemaker** is still quorate (via corosync). See the full chapter:
> > > «If Pacemaker integration is activated, SBD will not self-fence if
> > > **device** majority is lost [...]»
> > > https://urldefense.com/v3/__https://documentation.suse.com/sle-
> ha/15-
> > > SP4/html/SLE-HA-all/cha-ha-storage-
> > >
> protect.html__;!!ACWV5N9M2RV99hQ!LXxpjg0QHdAP0tvr809WCErcpPH0lx
> > > MKesDNqK-PU_Xpvb_KIGlj3uJcVLIbzQLViOi3EiSV3bkPUCHr$
> > >
> > > Would it be possible that no node is shutting down because the cluster is
> in
> > > two-node mode? Because of this mode, both would keep the quorum
> > > expecting the
> > > fencing to kill the other one... Except there's no active fencing here, only
> > > "self-fencing".
> > >
> >
> > I failed to mention I also have a Quorum Device also setup to add its vote to
> > the quorum. So two_node is not enabled.
> 
> oh, ok.
> 
> > I suspect Valentin was onto to something with pacemaker keeping the
> watchdog
> > device updated as it thinks the cluster is ok.  Need to research and test
> > that theory out.  I will try to carve some time out next week for that.
> 
> AFAIK, Pacemaker strictly rely on SBD to deal with the watchdog. It doesn't
> feed
> it by itself.
> 
> In Pacemaker mode, SBD is watching the two most important part of the
> cluster:
> Pacemaker and Corosync:
> 
> * the "Pacemaker watcher" of SBD connects to the CIB and check it's still
>   updated on a regular basis and the self-node is marked online.
> * the "Cluster watchers" all connect with each others using a dedicated
>   communication group in corosync ring(s).
> 
> Both watchers can report a failure to SBD that would self-stop the node.
> 
> If the network if down, I suppose the cluster watcher should complain. But I
> suspect Pacemaker somehow keeps reporting as quorate, thus, forbidding
> SBD to
> kill the whole node...

I was able to reset and re-test today.   Ends up that the watchdog device
was being updated by pacemaker due to the /etc/sysconfig/sbd entry:

SBD_PACEMAKER=yes.

When I set that to "no", then after running "firewall-cmd --panic-on" 
command, the sbd daemon detected the lack of activity on 
/dev/watchdog and self fenced the node within seconds.  Exactly 
what I was expecting.

When SBD_PACEMAKER was set to "yes", the lack of network connectivity to the node 
would be seen and acted upon by the remote nodes (evicts and takes
over ownership of the resources).  But the impacted node would just 
sit logging IO errors.  Pacemaker would keep updating the /dev/watchdog 
device so SBD would not self evict.   Once I re-enabled the network, then the
impacted node would finally start to see member timeouts and eventually see the
eviction noticed on the SBD device slot.  There is a brief time where the 
file systems are mounted on both nodes.  The pengine does report that
resources is "active on 2 nodes" and starts recovery.  At the same time,
pacemaker on the impacted node has network connectivity but iSCSI has
yet to get IO to disks.  It complains that it does not "see" the volume group.
Once iSCSI allows IO, then the impacted node is immediately fenced.

I think all of that implies the risk for file system corruption is small, but still there.

Seems counter-intuitive to have SBD_PACEMAKER=no, but it is starting
to make some sense.  

If I continue with SBD, then I will feel more comfortable with the SBD setting of
SBD_PACEMAKER=no.   I just need to test my other failure scenarios to make
sure I am not breaking something else.

Thanks for all of the feedback.  

Robert

> 
> > Appreciate all of the feedback.  I have been dealing with Cluster Suite for a
> > decade+ but focused on the company's setup.  I still have lots to learn,
> > which keeps me interested.
> 
> +1
> 
> Keep us informed!
> 
> Regards,