[ClusterLabs] [External] : Re: Fence Agent tests

Sat Nov 5 13:20:47 EDT 2022

> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
> Borzenkov
> Sent: Saturday, November 5, 2022 1:17 AM
> To: users at clusterlabs.org
> Subject: [External] : Re: [ClusterLabs] Fence Agent tests
> 
> On 04.11.2022 23:46, Robert Hayden wrote:
> > I am working on a Fencing agent for the Oracle Cloud Infrastructure (OCI)
> environment to complete power fencing of compute instances.  The only
> fencing setups I have seen for OCI are using SBD, but that is insufficient with
> full network interruptions since OCI uses iSCSI to write/read to the SBD disk.
> >
> 
> Out of curiosity - why is it insufficient? If cluster node is completely
> isolated, it should commit suicide. If host where cluster node is
> running is completely isolated, then you cannot do anything with this
> host anyway.

Personally, this was my first attempt with SBD, so I may be missing some core protections.  I am more
familiar with IPMILAN power fencing.  In my testing with full network hang (firewall-cmd panic-on), I was 
not getting the expected fencing results with SBD like I would with power fencing.  Hence, my long 
overdue learning of python to then attempt taking a crack at writing a fencing agent.

In my configuration, I am using HA-LVM (vg tags) to protect XFS file systems.  When
resources fail over, the file system moves to another node.   

The OCI compute instances don't have a hardware watchdog, only the software watchdog.
So, when the network goes completely hung (e.g. firewall-cmd panic-on), all network 
traffic stops which implies that IO to the SBD device also stops.  I do not see the software
watchdog take any action in response to the network hang.   The remote node will
see the network issue and write out the reset message in the SBD device slot for the hung
node to suicide.  But the impacted node cannot read that SBD device, so it never 
gets the message.  It just sits.  Applications can still run, but they don't have access to
the disks either (which is good).  In the full network hang, the remote node will wait until 2x SBD msg-timeout
and then assumes fencing was successful.  It then will attempt move the XFS file systems over.  
If the network-hung node wakes up, then I now have the XFS file systems mounted on both nodes
leading to corruption.   

This may be eliminated if I moved the HA-LVM setup from the vg_tags to
system_id.   With vg_tags, pacemaker adds a "pacemaker" tag to all controlled volume groups regardless
of the node that has the vg activated.  With system_id, the nodes uname is added to the vg metadata
so each node knows who officially has the vg activated.   I have not played with that scenario in OCI 
just yet.  I am not sure if pacemaker would simply remove the other node's uname and add its own
when it attempts to move the resource.   It is on my list to test because we moved to uname setup 
with Linux 8.

Again, this was my first attempt with SBD, so I may have it setup completely wrong.

> 
> I am not familiar with OCI architecture so I may be missing something
> obvious here.
> 
> 
> _______________________________________________
> Manage your subscription:
> https://urldefense.com/v3/__https://lists.clusterlabs.org/mailman/listinfo/u
> sers__;!!ACWV5N9M2RV99hQ!P-TvBc3_Pt-
> EGjuAuWw7Fa8vFbMYbE3gi73KUfdyxDBCXFuCWXcbdHNm63_AkgmJ5vhcNX
> mcIkMgXSBGaphrfZQ$
> 
> ClusterLabs home:
> https://urldefense.com/v3/__https://www.clusterlabs.org/__;!!ACWV5N9
> M2RV99hQ!P-TvBc3_Pt-
> EGjuAuWw7Fa8vFbMYbE3gi73KUfdyxDBCXFuCWXcbdHNm63_AkgmJ5vhcNX
> mcIkMgXSBGT2ncT5M$