<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">Le 21/03/2023 à 11:00, Jehan-Guillaume

      de Rorthais a écrit :<br>

    </div>

    <blockquote type="cite" cite="mid:20230321110033.5f0df130@karst">

      <pre class="moz-quote-pre" wrap="">Hi,

On Tue, 21 Mar 2023 09:33:04 +0100

Jérôme BECOT <a class="moz-txt-link-rfc2396E" href="mailto:jerome.becot@deveryware.com"><jerome.becot@deveryware.com></a> wrote:

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">We have several clusters running for different zabbix components. Some 

of these clusters consist of 2 zabbix proxies,where nodes run Mysql, 

Zabbix-proxy server and a VIP, and a corosync-qdevice. 

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

I'm not sure to understand your topology. The corosync-device is not supposed

to be on a cluster node. It is supposed to be on a remote node and provide some

quorum features to one or more cluster without setting up the whole

pacemaker/corosync stack.</pre>

    </blockquote>

    I was not clear, the qdevice is deployed on a remote node, as

    intended.<br>

    <blockquote type="cite" cite="mid:20230321110033.5f0df130@karst">

      <pre class="moz-quote-pre" wrap="">

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">The MySQL servers are always up to replicate, and are configured in

Master/Master (they both replicate from the other but only one is supposed to

be updated by the proxy running on the master node).

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

Why do you bother with Master/Master when a simple (I suppose, I'm not a MySQL

cluster guy) Primary-Secondary topology or even a shared storage would be

enough and would keep your logic (writes on one node only) safe from incidents,

failures, errors, etc?

HA must be a simple as possible. Remove useless parts when you can.</pre>

    </blockquote>

    A shared storage moves the complexity somewhere else. A classic

    Primary / secondary can be an option if PaceMaker manages to start

    the client on the slave node, but it would become Master/Master

    during the split brain.<br>

    <blockquote type="cite" cite="mid:20230321110033.5f0df130@karst">

      <pre class="moz-quote-pre" wrap="">

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">One cluster is prompt to frequent sync errors, with duplicate entries 

errors in SQL. When I look at the logs, I can see "Mar 21 09:11:41 

zabbix-proxy-01 pacemaker-controld  [948] (pcmk_cpg_membership)     

info: Group crmd event 89: zabbix-proxy-02 (node 2 pid 967) left via 

cluster exit", and within the next second, a rejoin. The same messages 

are in the other node logs, suggesting a split brain, which should not 

happen, because there is a quorum device.

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

Would it be possible your SQL sync errors and the left/join issues are

correlated and are both symptoms of another failure? Look at your log for some

explanation about why the node decided to leave the cluster.</pre>

    </blockquote>

    <p>My guess is that maybe a high latency in network cause the

      disjoin, hence starting Zabbix-proxy on both nodes causes the

      replication error. It is configured to use the vip which is up

      locally because there is a split brain.</p>

    <p>This is why I'm requesting guidance to check/monitor these nodes

      to find out if it is temporary network latency that is causing the

      disjoin.<br>

    </p>

    <blockquote type="cite" cite="mid:20230321110033.5f0df130@karst">

      <pre class="moz-quote-pre" wrap="">

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">Can you help me to troubleshoot this ? I can provide any 

log/configuration required in the process, so let me know.

I'd also like to ask if there is a bit of configuration that can be done 

to postpone service start on the other node for two or three seconds as 

a quick workaround ?

</pre>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

How would it be a workaround?</pre>

    </blockquote>

    Because if network issues persist, the proxy would not be started on

    the slave node, as the disjoin just last for less than two seconds.

    Fixing the network is the solution (but not in my power), delaying

    the service start in this case looks like a decent workaround for

    me.<br>

    <blockquote type="cite" cite="mid:20230321110033.5f0df130@karst">

      <pre class="moz-quote-pre" wrap="">

Regards,

</pre>

    </blockquote>

    <div class="moz-signature">-- <br>

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <title></title>

      <div class="moz-signature"><b><span style="color:#002060">Jérôme

            BECOT</span></b> <span style="color:#002060"></span><br>

        <span style="color:#002060">Ingénieur DevOps Infrastructure </span><br>

        <br>

        <span style="color:#002060">Téléphone fixe: </span> <span

          style="color:#002060;mso-fareast-language:FR">01 82 28 37 06</span><br>

        <span style="color:#002060">Mobile : +33 757 173 193</span><br>

        <span style="color:#002060">Deveryware - 43 rue Taitbout - 75009

          PARIS</span><br>

        <a moz-do-not-send="true" href="https://www.deveryware.com"> <span

            style="color:#002060"><span tyle="color:#002060">

              https://www.deveryware.com</span></span></a></div>

      <div class="moz-signature"> <span

          style="color:#002060;mso-fareast-language:FR"></span><br>

        <img moz-do-not-send="false"

          src="cid:part1.OjO7RCO0.zpV0gi8l@deveryware.com"

          alt="Deveryware_Logo" width="402" height="107"><br>

        <a href="https://www.deveryware.com"> <span

style="font-size:10.0pt;color:#08638F;mso-fareast-language:FR;text-decoration:none"></span></a></div>

    </div>

  </body>

</html>