<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 15/11/2021 12:03, Klaus Wenninger
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Mon, Nov 15, 2021 at
            12:19 PM Andrei Borzenkov <<a
              href="mailto:arvidjaar@gmail.com" moz-do-not-send="true"
              class="moz-txt-link-freetext">arvidjaar@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">On Mon, Nov 15, 2021 at
            1:18 PM Klaus Wenninger <<a
              href="mailto:kwenning@redhat.com" target="_blank"
              moz-do-not-send="true" class="moz-txt-link-freetext">kwenning@redhat.com</a>>
            wrote:<br>
            ><br>
            ><br>
            ><br>
            > On Mon, Nov 15, 2021 at 10:37 AM S Rogers <<a
              href="mailto:sa.rogers1342@gmail.com" target="_blank"
              moz-do-not-send="true" class="moz-txt-link-freetext">sa.rogers1342@gmail.com</a>>
            wrote:<br>
            >><br>
            >> I had thought about doing that, but the cluster is
            then dependent on the<br>
            >> external system, and if that external system was to
            go down or become<br>
            >> unreachable for any reason then it would falsely
            cause the cluster to<br>
            >> failover or worse it could even take the cluster
            down completely, if the<br>
            >> external system goes down and both nodes cannot
            ping it.<br>
            ><br>
            > You wouldn't necessarily have to ban resources from
            nodes that can't<br>
            > reach the external network. It would be enough to make
            them prefer<br>
            > the location that has connection. So if both lose
            connection  one side<br>
            > would still stay up.<br>
            > Not to depend on something really external you might
            use the<br>
            > router to your external network as ping target.<br>
            > In case of fencing - triggered by whatever - and a
            potential fence-race<br>
            <br>
            The problem here is that nothing really triggers fencing.
            What happens, is<br>
          </blockquote>
          <div><br>
          </div>
          <div>Got that! Which is why I gave the hint how to prevent
            shutting down</div>
          <div>services with ping first.</div>
          <div>Taking care of what happens when nodes are fenced still
            makes sense.</div>
          <div>Imagine a fence-race where the node running services
            loses just</div>
          <div>to afterwards get the services moved back when it comes
            up again.</div>
          <div><br>
          </div>
          <div>Klaus</div>
        </div>
      </div>
    </blockquote>
    Thanks, I wasn't aware of priority-fencing-delay. While it doesn't
    solve this problem, I can still use it to improve the fencing
    behaviour of the cluster in general.<br>
    <p>Unfortunately, in some situations this cluster will be deployed
      in a completely isolated network so there may not even be a router
      that we can use as a ping target, and we can't guarantee the
      presence of any other system on the network that we could reliably
      use as a ping target.</p>
    <blockquote type="cite"
cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <br>
            - two postgres lose connection over external network, but
            cluster<br>
            nodes retain connectivity over another network<br>
            - postgres RA compares "latest timestamp" when selecting the
            best node<br>
            to fail over to<br>
            - primary postgres has better timestamp, so RA simply does
            not<br>
            consider secondary as suitable for (atomatic) failover<br>
            <br>
            The only solution here - as long as fencing node on external<br>
            connectivity loss is acceptable - is modifying ethmonitor RA
            to fail<br>
            monitor operation in this case.<br>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <p>I was hoping to find a way to achieve the desired outcome without
      resorting to a custom RA, but it does appear to be the only
      solution.</p>
    <p>This may not be the right audience, but does anyone know if it is
      a viable change to add an additional parameter to the ethmonitor
      RA that allows users to override the desired behaviour when the
      monitor operation fails? (ie, a 'monitor_force_fail' parameter
      that when set to true will cause the monitor operation to fail if
      it determines the interface is down)</p>
    <p>Being relatively new to pacemaker, I don't know whether this goes
      against RA conventions/practices.<br>
    </p>
    <blockquote type="cite"
cite="mid:CALrDAo0Fb3GijVBFnXWZ6A7kHoqUMwD_m1X-Y5_doJnY9n+98g@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <br>
            > you might use the rather new feature
            priority-fencing-delay (give the node<br>
            > that is running valuable resources a benefit in the
            race) or go for<br>
            > fence_heuristics_ping (pseudo fence-resource that
            together with a<br>
            > fencing-topology prevents the node without access to a
            certain IP<br>
            > from fencing the other node).<br>
            > <a
href="https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html"
              rel="noreferrer" target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html</a><br>
            > <a
href="https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py"
              rel="noreferrer" target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://github.com/ClusterLabs/fence-agents/blob/master/agents/heuristics_ping/fence_heuristics_ping.py</a><br>
            ><br>
            > Klaus<br>
            > _______________________________________________<br>
            >><br>
            >> Manage your subscription:<br>
            >> <a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
            >><br>
            >> ClusterLabs home: <a
              href="https://www.clusterlabs.org/" rel="noreferrer"
              target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>
            >><br>
            > _______________________________________________<br>
            > Manage your subscription:<br>
            > <a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
            ><br>
            > ClusterLabs home: <a
              href="https://www.clusterlabs.org/" rel="noreferrer"
              target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>
            _______________________________________________<br>
            Manage your subscription:<br>
            <a
              href="https://lists.clusterlabs.org/mailman/listinfo/users"
              rel="noreferrer" target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
            <br>
            ClusterLabs home: <a href="https://www.clusterlabs.org/"
              rel="noreferrer" target="_blank" moz-do-not-send="true"
              class="moz-txt-link-freetext">https://www.clusterlabs.org/</a><br>
            <br>
          </blockquote>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Manage your subscription:
<a class="moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="moz-txt-link-freetext" href="https://www.clusterlabs.org/">https://www.clusterlabs.org/</a>
</pre>
    </blockquote>
  </body>
</html>