<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 15/11/2021 12:03, Klaus Wenninger
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Nov 15, 2021 at
12:19 PM Andrei Borzenkov <<a
href="mailto:arvidjaar@gmail.com" moz-do-not-send="true"
class="moz-txt-link-freetext">arvidjaar@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">On Mon, Nov 15, 2021 at
1:18 PM Klaus Wenninger <<a
href="mailto:kwenning@redhat.com" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">kwenning@redhat.com</a>>
wrote:<br>
><br>
><br>
><br>
> On Mon, Nov 15, 2021 at 10:37 AM S Rogers <<a
href="mailto:sa.rogers1342@gmail.com" target="_blank"
moz-do-not-send="true" class="moz-txt-link-freetext">sa.rogers1342@gmail.com</a>>
wrote:<br>
>><br>
>> I had thought about doing that, but the cluster is
then dependent on the<br>
>> external system, and if that external system was to
go down or become<br>
>> unreachable for any reason then it would falsely
cause the cluster to<br>
>> failover or worse it could even take the cluster
down completely, if the<br>
>> external system goes down and both nodes cannot
ping it.<br>
><br>
> You wouldn't necessarily have to ban resources from
nodes that can't<br>
> reach the external network. It would be enough to make
them prefer<br>
> the location that has connection. So if both lose
connection one side<br>
> would still stay up.<br>
> Not to depend on something really external you might
use the<br>
> router to your external network as ping target.<br>
> In case of fencing - triggered by whatever - and a
potential fence-race<br>
<br>
The problem here is that nothing really triggers fencing.
What happens, is<br>
</blockquote>
<div><br>
</div>
<div>Got that! Which is why I gave the hint how to prevent
shutting down</div>
<div>services with ping first.</div>
<div>Taking care of what happens when nodes are fenced still
makes sense.</div>
<div>Imagine a fence-race where the node running services
loses just</div>
<div>to afterwards get the services moved back when it comes
up again.</div>
<div><br>
</div>
<div>Klaus</div>
</div>
</div>
</blockquote>
Thanks, I wasn't aware of priority-fencing-delay. While it doesn't
solve this problem, I can still use it to improve the fencing
behaviour of the cluster in general.<br>
<p>Unfortunately, in some situations this cluster will be deployed
in a completely isolated network so there may not even be a router
that we can use as a ping target, and we can't guarantee the
presence of any other system on the network that we could reliably
use as a ping target.</p>
<blockquote type="cite"
cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<br>
- two postgres lose connection over external network, but
cluster<br>
nodes retain connectivity over another network<br>
- postgres RA compares "latest timestamp" when selecting the
best node<br>
to fail over to<br>
- primary postgres has better timestamp, so RA simply does
not<br>
consider secondary as suitable for (atomatic) failover<br>
<br>
The only solution here - as long as fencing node on external<br>
connectivity loss is acceptable - is modifying ethmonitor RA
to fail<br>
monitor operation in this case.<br>
</blockquote>
</div>
</div>
</blockquote>
<p>I was hoping to find a way to achieve the desired outcome without
resorting to a custom RA, but it does appear to be the only
solution.</p>
<p>This may not be the right audience, but does anyone know if it is
a viable change to add an additional parameter to the ethmonitor
RA that allows users to override the desired behaviour when the
monitor operation fails? (ie, a 'monitor_force_fail' parameter
that when set to true will cause the monitor operation to fail if
it determines the interface is down)</p>
<p>Being relatively new to pacemaker, I don't know whether this goes
against RA conventions/practices.<br>
</p>
<blockquote type="cite"
cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<br>
> you might use the rather new feature
priority-fencing-delay (give the node<br>
> that is running valuable resources a benefit in the
race) or go for<br>
> fence_heuristics_ping (pseudo fence-resource that
together with a<br>
> fencing-topology prevents the node without access to a
certain IP<br>
> from fencing the other node).<br>
> <a
href="https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html</a><br>
> <a
href="https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py</a><br>
><br>
> Klaus<br>
> _______________________________________________<br>
>><br>
>> Manage your subscription:<br>
>> <a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
>><br>
>> ClusterLabs home: <a
href="https://www.clusterlabs.org/" rel="noreferrer"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>
>><br>
> _______________________________________________<br>
> Manage your subscription:<br>
> <a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
><br>
> ClusterLabs home: <a
href="https://www.clusterlabs.org/" rel="noreferrer"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>
_______________________________________________<br>
Manage your subscription:<br>
<a
href="https://lists.clusterlabs.org/mailman/listinfo/users"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/"
rel="noreferrer" target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>
<br>
</blockquote>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Manage your subscription:
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>
ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a>
</pre>
</blockquote>
</body>
</html>