[ClusterLabs] Fence node when network interface goes down

Klaus Wenninger kwenning at redhat.com
Mon Nov 15 07:03:20 EST 2021


On Mon, Nov 15, 2021 at 12:19 PM Andrei Borzenkov <arvidjaar at gmail.com>
wrote:

> On Mon, Nov 15, 2021 at 1:18 PM Klaus Wenninger <kwenning at redhat.com>
> wrote:
> >
> >
> >
> > On Mon, Nov 15, 2021 at 10:37 AM S Rogers <sa.rogers1342 at gmail.com>
> wrote:
> >>
> >> I had thought about doing that, but the cluster is then dependent on the
> >> external system, and if that external system was to go down or become
> >> unreachable for any reason then it would falsely cause the cluster to
> >> failover or worse it could even take the cluster down completely, if the
> >> external system goes down and both nodes cannot ping it.
> >
> > You wouldn't necessarily have to ban resources from nodes that can't
> > reach the external network. It would be enough to make them prefer
> > the location that has connection. So if both lose connection  one side
> > would still stay up.
> > Not to depend on something really external you might use the
> > router to your external network as ping target.
> > In case of fencing - triggered by whatever - and a potential fence-race
>
> The problem here is that nothing really triggers fencing. What happens, is
>

Got that! Which is why I gave the hint how to prevent shutting down
services with ping first.
Taking care of what happens when nodes are fenced still makes sense.
Imagine a fence-race where the node running services loses just
to afterwards get the services moved back when it comes up again.

Klaus


>
> - two postgres lose connection over external network, but cluster
> nodes retain connectivity over another network
> - postgres RA compares "latest timestamp" when selecting the best node
> to fail over to
> - primary postgres has better timestamp, so RA simply does not
> consider secondary as suitable for (atomatic) failover
>
> The only solution here - as long as fencing node on external
> connectivity loss is acceptable - is modifying ethmonitor RA to fail
> monitor operation in this case.
>
> > you might use the rather new feature priority-fencing-delay (give the
> node
> > that is running valuable resources a benefit in the race) or go for
> > fence_heuristics_ping (pseudo fence-resource that together with a
> > fencing-topology prevents the node without access to a certain IP
> > from fencing the other node).
> >
> https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html
> >
> https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py
> >
> > Klaus
> > _______________________________________________
> >>
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users
> >>
> >> ClusterLabs home: https://www.clusterlabs.org/
> >>
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20211115/50206012/attachment.htm>


More information about the Users mailing list