[ClusterLabs] Question about automating cluster unfencing.

Strahil Nikolov hunter86_bg at yahoo.com
Sun Aug 29 06:36:07 EDT 2021


You can setup the system in such case that on fabric fence, the node is rebooted which will allow it to 'unfence' itself afterwards.
For details check https://access.redhat.com/solutions/3367151 or  https://access.redhat.com/node/65187 (You may use RH developer subscription in order to acess it).

It seems that fence_mpath has watchdog integration after a certain version, while you can still use /usr/share/cluster/fence_mpath_check (via watchdog service and supported watchdog device). Even if you don't have a proper watchdog device, you can use the 'softdog' module as the system is fenced via SAN and even if not rebooted , there is no risk .

Best Regards,Strahil Nikolov

Sent from Yahoo Mail on Android 
 
  On Sat, Aug 28, 2021 at 10:14, Andrei Borzenkov<arvidjaar at gmail.com> wrote:   On Fri, Aug 27, 2021 at 8:11 PM Gerry R Sommerville <gerry at ca.ibm.com> wrote:
>
> Hey all,
>
> From what I see in the documentation for fabric fencing, Pacemaker requires an administrator to login to the node to manually start and unfence the node after some failure.
>  https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-unfencing.html
>

This is about fabric (or resource) fencing. In this case node is cut
off from some vital resources but remains up and running. In this case
someone indeed needs to intervene manually.

> The concern I have is if there is an intermittent network issues, a node may get fenced and we have to wait for someone to log into the cluster and bring the node back online. Meanwhile the network issue may have resolved itself shortly after the node was fenced.
>
> I wonder if there are any configurations or popular solutions that people use to automatically unfence nodes and have them rejoin the cluster?
>

Most people use stonith (or node fencing) and affected node is
rebooted. As long as pacemaker is configured to start automatically
and network connectivity is restored after reboot node will join
custer automatically.

I think that in case of fabric fencing node is undefnced automatically
when it reboots and attempts to join cluster (hopefully someone may
chime in here). I am not sure what happens if node is not rebooted but
connectivity is restored.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210829/f800c18c/attachment.htm>


More information about the Users mailing list