[ClusterLabs] Question about automating cluster unfencing.

Sat Aug 28 03:13:44 EDT 2021

On Fri, Aug 27, 2021 at 8:11 PM Gerry R Sommerville <gerry at ca.ibm.com> wrote:
>
> Hey all,
>
> From what I see in the documentation for fabric fencing, Pacemaker requires an administrator to login to the node to manually start and unfence the node after some failure.
>   https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-unfencing.html
>

This is about fabric (or resource) fencing. In this case node is cut
off from some vital resources but remains up and running. In this case
someone indeed needs to intervene manually.

> The concern I have is if there is an intermittent network issues, a node may get fenced and we have to wait for someone to log into the cluster and bring the node back online. Meanwhile the network issue may have resolved itself shortly after the node was fenced.
>
> I wonder if there are any configurations or popular solutions that people use to automatically unfence nodes and have them rejoin the cluster?
>

Most people use stonith (or node fencing) and affected node is
rebooted. As long as pacemaker is configured to start automatically
and network connectivity is restored after reboot node will join
custer automatically.

I think that in case of fabric fencing node is undefnced automatically
when it reboots and attempts to join cluster (hopefully someone may
chime in here). I am not sure what happens if node is not rebooted but
connectivity is restored.