<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 15/11/2021 12:03, Klaus Wenninger

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr"><br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Mon, Nov 15, 2021 at

            12:19 PM Andrei Borzenkov <<a

              href="mailto:arvidjaar@gmail.com" moz-do-not-send="true"

              class="moz-txt-link-freetext">arvidjaar@gmail.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">On Mon, Nov 15, 2021 at

            1:18 PM Klaus Wenninger <<a

              href="mailto:kwenning@redhat.com" target="_blank"

              moz-do-not-send="true" class="moz-txt-link-freetext">kwenning@redhat.com</a>>

            wrote:<br>

            ><br>

            ><br>

            ><br>

            > On Mon, Nov 15, 2021 at 10:37 AM S Rogers <<a

              href="mailto:sa.rogers1342@gmail.com" target="_blank"

              moz-do-not-send="true" class="moz-txt-link-freetext">sa.rogers1342@gmail.com</a>>

            wrote:<br>

            >><br>

            >> I had thought about doing that, but the cluster is

            then dependent on the<br>

            >> external system, and if that external system was to

            go down or become<br>

            >> unreachable for any reason then it would falsely

            cause the cluster to<br>

            >> failover or worse it could even take the cluster

            down completely, if the<br>

            >> external system goes down and both nodes cannot

            ping it.<br>

            ><br>

            > You wouldn't necessarily have to ban resources from

            nodes that can't<br>

            > reach the external network. It would be enough to make

            them prefer<br>

            > the location that has connection. So if both lose

            connection  one side<br>

            > would still stay up.<br>

            > Not to depend on something really external you might

            use the<br>

            > router to your external network as ping target.<br>

            > In case of fencing - triggered by whatever - and a

            potential fence-race<br>

            <br>

            The problem here is that nothing really triggers fencing.

            What happens, is<br>

          </blockquote>

          <div><br>

          </div>

          <div>Got that! Which is why I gave the hint how to prevent

            shutting down</div>

          <div>services with ping first.</div>

          <div>Taking care of what happens when nodes are fenced still

            makes sense.</div>

          <div>Imagine a fence-race where the node running services

            loses just</div>

          <div>to afterwards get the services moved back when it comes

            up again.</div>

          <div><br>

          </div>

          <div>Klaus</div>

        </div>

      </div>

    </blockquote>

    Thanks, I wasn't aware of priority-fencing-delay. While it doesn't

    solve this problem, I can still use it to improve the fencing

    behaviour of the cluster in general.<br>

    <p>Unfortunately, in some situations this cluster will be deployed

      in a completely isolated network so there may not even be a router

      that we can use as a ping target, and we can't guarantee the

      presence of any other system on the network that we could reliably

      use as a ping target.</p>

    <blockquote type="cite"

cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <br>

            - two postgres lose connection over external network, but

            cluster<br>

            nodes retain connectivity over another network<br>

            - postgres RA compares "latest timestamp" when selecting the

            best node<br>

            to fail over to<br>

            - primary postgres has better timestamp, so RA simply does

            not<br>

            consider secondary as suitable for (atomatic) failover<br>

            <br>

            The only solution here - as long as fencing node on external<br>

            connectivity loss is acceptable - is modifying ethmonitor RA

            to fail<br>

            monitor operation in this case.<br>

          </blockquote>

        </div>

      </div>

    </blockquote>

    <p>I was hoping to find a way to achieve the desired outcome without

      resorting to a custom RA, but it does appear to be the only

      solution.</p>

    <p>This may not be the right audience, but does anyone know if it is

      a viable change to add an additional parameter to the ethmonitor

      RA that allows users to override the desired behaviour when the

      monitor operation fails? (ie, a 'monitor_force_fail' parameter

      that when set to true will cause the monitor operation to fail if

      it determines the interface is down)</p>

    <p>Being relatively new to pacemaker, I don't know whether this goes

      against RA conventions/practices.<br>

    </p>

    <blockquote type="cite"

cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <br>

            > you might use the rather new feature

            priority-fencing-delay (give the node<br>

            > that is running valuable resources a benefit in the

            race) or go for<br>

            > fence_heuristics_ping (pseudo fence-resource that

            together with a<br>

            > fencing-topology prevents the node without access to a

            certain IP<br>

            > from fencing the other node).<br>

            > <a

href="https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html</a><br>

            > <a

href="https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py</a><br>

            ><br>

            > Klaus<br>

            > _______________________________________________<br>

            >><br>

            >> Manage your subscription:<br>

            >> <a

              href="https://lists.clusterlabs.org/mailman/listinfo/users"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

            >><br>

            >> ClusterLabs home: <a

              href="https://www.clusterlabs.org/" rel="noreferrer"

              target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>

            >><br>

            > _______________________________________________<br>

            > Manage your subscription:<br>

            > <a

              href="https://lists.clusterlabs.org/mailman/listinfo/users"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

            ><br>

            > ClusterLabs home: <a

              href="https://www.clusterlabs.org/" rel="noreferrer"

              target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>

            _______________________________________________<br>

            Manage your subscription:<br>

            <a

              href="https://lists.clusterlabs.org/mailman/listinfo/users"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

            <br>

            ClusterLabs home: <a href="https://www.clusterlabs.org/"

              rel="noreferrer" target="_blank" moz-do-not-send="true"

              class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>

            <br>

          </blockquote>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

Manage your subscription:

<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a>

</pre>

    </blockquote>

  </body>

</html>