<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Quorum doesn't prevent split-brains,
      stonith (fencing) does. <br>
      <br>
      <a class="moz-txt-link-freetext" href="https://www.alteeve.com/w/The_2-Node_Myth">https://www.alteeve.com/w/The_2-Node_Myth</a><br>
      <br>
      There is no way to use quorum-only to avoid a potential
      split-brain. You might be able to make it less likely with enough
      effort, but never prevent it.<br>
      <br>
      digimer<br>
      <br>
      On 2017-11-14 10:45 PM, Garima wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:86CE92308A86944992E6D1B25BD638AEAE79734A@EXCH-MB01-DEL.nectechnologies.in">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.apple-converted-space
        {mso-style-name:apple-converted-space;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hello
            All,<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Split-brain
            situation occurs due to there is a drop in quorum which
            leads to Spilt-brain situation and status information is not
            exchanged between both two nodes of the cluster. <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">This
            can be avoided if quorum communicates between both the
            nodes.
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I
            have checked the code. In My opinion these files need to be
            updated (quorum.py/stonith.py) to avoid the spilt-brain
            situation to maintain Active-Passive configuration.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Regards,<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Garima<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <div>
          <div style="border:none;border-top:solid #E1E1E1
            1.0pt;padding:3.0pt 0cm 0cm 0cm">
            <p class="MsoNormal"><b><span
                  style="font-size:11.0pt;font-family:"Calibri",sans-serif"
                  lang="EN-US">From:</span></b><span
                style="font-size:11.0pt;font-family:"Calibri",sans-serif"
                lang="EN-US"> Derek Wuelfrath
                [<a class="moz-txt-link-freetext" href="mailto:dwuelfrath@inverse.ca">mailto:dwuelfrath@inverse.ca</a>]
                <br>
                <b>Sent:</b> 13 November 2017 20:55<br>
                <b>To:</b> Cluster Labs - All topics related to
                open-source clustering welcomed
                <a class="moz-txt-link-rfc2396E" href="mailto:users@clusterlabs.org"><users@clusterlabs.org></a><br>
                <b>Subject:</b> Re: [ClusterLabs] Pacemaker responsible
                of DRBD and a systemd resource<o:p></o:p></span></p>
          </div>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">Hello Ken !<o:p></o:p></p>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div>
          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
            <p class="MsoNormal">Make sure that the systemd service is
              not enabled. If pacemaker is<br>
              managing a service, systemd can't also be trying to start
              and stop it.<o:p></o:p></p>
          </blockquote>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div>
          <p class="MsoNormal">It is not. I made sure of this in the
            first place :)<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div>
          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
            <p class="MsoNormal">Beyond that, the question is what log
              messages are there from around<br>
              the time of the issue (on both nodes).<o:p></o:p></p>
          </blockquote>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div>
          <p class="MsoNormal">Well, that’s the thing. There is not much
            log messages telling what is actually happening. The
            ’systemd’ resource is not even trying to start (nothing in
            either log for that resource). Here are the logs from my
            last attempt:<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal">Scenario:<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal">- Services were running on
            ‘pancakeFence2’. DRBD was synced and connected<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal">- I rebooted ‘pancakeFence2’. Services
            failed to ‘pancakeFence1’<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal">- After ‘pancakeFence2’ comes back,
            services are running just fine on ‘pancakeFence1’ but DRBD
            is in Standalone due to split-brain<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div>
          <p class="MsoNormal">Logs for pancakeFence1: <a
              href="https://pastebin.com/dVSGPP78"
              moz-do-not-send="true">https://pastebin.com/dVSGPP78</a><o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal">Logs for pancakeFence2: <a
              href="https://pastebin.com/at8qPkHE"
              moz-do-not-send="true">https://pastebin.com/at8qPkHE</a><o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div>
          <p class="MsoNormal">It really looks like the status checkup
            mechanism of corosync/pacemaker for a systemd resource force
            the resource to “start” and therefore, start the ones above
            that resource in the group (DRBD in instance).<o:p></o:p></p>
        </div>
        <div>
          <p class="MsoNormal">This does not happen for a regular OCF
            resource (IPaddr2 per example)<o:p></o:p></p>
        </div>
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>
                        <div>
                          <div>
                            <div>
                              <div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black"><br>
                                      Cheers!<o:p></o:p></span></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black">-dw<o:p></o:p></span></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black"><o:p> </o:p></span></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black">--<o:p></o:p></span></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black">Derek
                                      Wuelfrath<o:p></o:p></span></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black"><a
href="mailto:dwuelfrath@inverse.ca" moz-do-not-send="true">dwuelfrath@inverse.ca</a> ::
                                      +1.514.447.4918 (x110) ::
                                      +1.866.353.6153 (x110)<o:p></o:p></span></p>
                                </div>
                                <div>
                                  <p class="MsoNormal"><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif;color:black">Inverse
                                      inc. :: Leaders behind SOGo (<a
                                        href="https://www.sogo.nu/"
                                        moz-do-not-send="true">www.sogo.nu</a>),
                                      PacketFence (<a
                                        href="https://www.packetfence.org/"
                                        moz-do-not-send="true">www.packetfence.org</a>)
                                      and Fingerbank (<a
                                        href="https://www.fingerbank.org"
                                        moz-do-not-send="true">www.fingerbank.org</a>)<o:p></o:p></span></p>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
        <div>
          <p class="MsoNormal"><br>
            <br>
            <o:p></o:p></p>
          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
            <div>
              <p class="MsoNormal">On Nov 10, 2017, at 11:39, Ken
                Gaillot <<a href="mailto:kgaillot@redhat.com"
                  moz-do-not-send="true">kgaillot@redhat.com</a>>
                wrote:<o:p></o:p></p>
            </div>
            <p class="MsoNormal"><o:p> </o:p></p>
            <div>
              <p class="MsoNormal"><span
                  style="font-size:9.0pt;font-family:"Helvetica",sans-serif">On
                  Thu, 2017-11-09 at 20:27 -0500, Derek Wuelfrath wrote:<br
                    style="font-variant-caps:
                    normal;text-align:start;-webkit-text-stroke-width:
                    0px;word-spacing:0px">
                  <br>
                </span><o:p></o:p></p>
              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                <p class="MsoNormal"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">Hello
                    there,<br>
                    <br>
                    First post here but following since a while!<o:p></o:p></span></p>
              </blockquote>
              <p class="MsoNormal"><span
                  style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                  Welcome!<br>
                  <br style="font-variant-caps:
                    normal;text-align:start;-webkit-text-stroke-width:
                    0px;word-spacing:0px">
                  <br>
                </span><o:p></o:p></p>
              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                <p class="MsoNormal"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                    Here’s my issue,<br>
                    we are putting in place and running this type of
                    cluster since a<br>
                    while and never really encountered this kind of
                    problem.<br>
                    <br>
                    I recently set up a Corosync / Pacemaker / PCS
                    cluster to manage DRBD<br>
                    along with different other resources. Part of theses
                    resources are<br>
                    some systemd resources… this is the part where
                    things are “breaking”.<br>
                    <br>
                    Having a two servers cluster running only DRBD or
                    DRBD with an OCF<br>
                    ipaddr2 resource (Cluser IP in instance) works just
                    fine. I can<br>
                    easily move from one node to the other without any
                    issue.<br>
                    As soon as I add a systemd resource to the resource
                    group, things are<br>
                    breaking. Moving from one node to the other using
                    standby mode works<br>
                    just fine but as soon as Corosync / Pacemaker
                    restart involves<br>
                    polling of a systemd resource, it seems like it is
                    trying to start<br>
                    the whole resource group and therefore, create a
                    split-brain of the<br>
                    DRBD resource.<o:p></o:p></span></p>
              </blockquote>
              <p class="MsoNormal"><span
                  style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                  My first two suggestions would be:<br>
                  <br>
                  Make sure that the systemd service is not enabled. If
                  pacemaker is<br>
                  managing a service, systemd can't also be trying to
                  start and stop it.<br>
                  <br>
                  Fencing is the only way pacemaker can resolve
                  split-brains and certain<br>
                  other situations, so that will help in the recovery.<br>
                  <br>
                  Beyond that, the question is what log messages are
                  there from around<br>
                  the time of the issue (on both nodes).<br>
                  <br>
                  <br style="font-variant-caps:
                    normal;text-align:start;-webkit-text-stroke-width:
                    0px;word-spacing:0px">
                  <br>
                </span><o:p></o:p></p>
              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                <p class="MsoNormal"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                    It is the best explanation / description of the
                    situation that I can<br>
                    give. If it need any clarification, examples, … I am
                    more than open<br>
                    to share them.<br>
                    <br>
                    Any guidance would be appreciated :)<br>
                    <br>
                    Here’s the output of a ‘pcs config’<br>
                    <br>
                    <a href="https://pastebin.com/1TUvZ4X9"
                      moz-do-not-send="true">https://pastebin.com/1TUvZ4X9</a><br>
                    <br>
                    Cheers!<br>
                    -dw<br>
                    <br>
                    --<br>
                    Derek Wuelfrath<br>
                    <a href="mailto:dwuelfrath@inverse.ca"
                      moz-do-not-send="true">dwuelfrath@inverse.ca</a> ::
                    +1.514.447.4918 (x110) :: +1.866.353.6153<br>
                    (x110)<br>
                    Inverse inc. :: Leaders behind SOGo (<a
                      href="http://www.sogo.nu" moz-do-not-send="true">www.sogo.nu</a>),
                    PacketFence<br>
                    (<a href="http://www.packetfence.org"
                      moz-do-not-send="true">www.packetfence.org</a>)
                    and Fingerbank (<a href="http://www.fingerbank.org"
                      moz-do-not-send="true">www.fingerbank.org</a>)<o:p></o:p></span></p>
              </blockquote>
              <p class="MsoNormal"><span
                  style="font-size:9.0pt;font-family:"Helvetica",sans-serif">--<span
                    class="apple-converted-space"> </span><br>
                  Ken Gaillot <</span><a
                  href="mailto:kgaillot@redhat.com"
                  moz-do-not-send="true"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">kgaillot@redhat.com</span></a><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif">><br>
                  <br>
                  _______________________________________________<br>
                  Users mailing list:<span class="apple-converted-space"> </span></span><a
                  href="mailto:Users@clusterlabs.org"
                  moz-do-not-send="true"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">Users@clusterlabs.org</span></a><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                </span><a
                  href="http://lists.clusterlabs.org/mailman/listinfo/users"
                  moz-do-not-send="true"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">http://lists.clusterlabs.org/mailman/listinfo/users</span></a><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                  <br>
                  Project Home:<span class="apple-converted-space"> </span></span><a
                  href="http://www.clusterlabs.org/"
                  moz-do-not-send="true"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">http://www.clusterlabs.org</span></a><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                  Getting started:<span class="apple-converted-space"> </span></span><a
href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf"
                  moz-do-not-send="true"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</span></a><span
style="font-size:9.0pt;font-family:"Helvetica",sans-serif"><br>
                  Bugs:<span class="apple-converted-space"> </span></span><a
                  href="http://bugs.clusterlabs.org/"
                  moz-do-not-send="true"><span
                    style="font-size:9.0pt;font-family:"Helvetica",sans-serif">http://bugs.clusterlabs.org</span></a><o:p></o:p></p>
            </div>
          </blockquote>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list: <a class="moz-txt-link-abbreviated" href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a>
<a class="moz-txt-link-freetext" href="http://lists.clusterlabs.org/mailman/listinfo/users">http://lists.clusterlabs.org/mailman/listinfo/users</a>

Project Home: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org">http://www.clusterlabs.org</a>
Getting started: <a class="moz-txt-link-freetext" href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a>
Bugs: <a class="moz-txt-link-freetext" href="http://bugs.clusterlabs.org">http://bugs.clusterlabs.org</a>
</pre>
    </blockquote>
    <p><br>
    </p>
    <pre class="moz-signature" cols="72">-- 
Digimer
Papers and Projects: <a class="moz-txt-link-freetext" href="https://alteeve.com/w/">https://alteeve.com/w/</a>
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould</pre>
  </body>
</html>