<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF">

    <div class="gmail-m_-5926390740668612657moz-cite-prefix">On 8/7/19 12:26 PM, Momcilo Medic

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"> We have three node cluster that is setup to stop

        resources on lost quorum.<br>

        Failure (network going down) handling is done properly, but

        recovery doesn't seem to work.<br>

      </div>

    </blockquote>

    <tt>What do you mean by 'network going down'?</tt><tt><br>

    </tt><tt>Loss of link? Does the IP persist on the interface</tt><tt><br>

    </tt><tt>in that case?</tt><tt><br></tt></div></blockquote><div><br></div><div>Yes, we simulate faulty cable by turning switch ports down and up.<br>In such a case, the IP does not persist on the interface.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><tt>

    </tt><tt>That there are issue reconnecting the CPG-API</tt><tt><br>

    </tt><tt>sounds strange to me. Already the fact that</tt><tt><br>

    </tt><tt>something has to be reconnected. I got it</tt><tt><br>

    </tt><tt>that your nodes were persistently up during the</tt><tt><br>

    </tt><tt>network-disconnection. Although I would have</tt><tt><br>

    </tt><tt>expected fencing to kick in at least on those</tt><tt><br>

    </tt><tt>which are part of the non-quorate cluster-partition.</tt><tt><br>

    </tt><tt>Maybe a few words more on your scenario</tt><tt><br>

    </tt><tt>(fening-setup e.g.) would help to understand what</tt><tt><br>

    </tt><tt>is going on.</tt><tt><br></tt></div></blockquote><div><br></div><div>We don't use any fencing mechanisms, we rely on quorum to run the services.<br>In more detail, we run three node Linbit LINSTOR storage that is hyperconverged.<br>Meaning, we run clustered storage on the virtualization hypervisors.<br><br>We use pcs in order to have linstor-controller service in high availabilty mode.<br>Policy for no quorum is to stop the resources.<br><br>In such hyperconverged setup, we can't fence a node without impact.<br>It may happen that network instability causes primary node to no longer be primary.<br>In that case, we don't want running VMs to go down with the ship, as there was no impact for them.<br><br>However, we would like to have high-availability of that service upon network restoration, without manual actions.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><tt>

    </tt><tt><br>

    </tt><tt>Klaus</tt><br>

    <blockquote type="cite">

      <div dir="ltr"><br>

        What happens is, services crash when we re-enable network

        connection.<br>

        <br>

        From journal:<br>

        <br>

        ```<br>

        ...<br>

        Jul 12 00:27:32 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        corosync[9069]: corosync: totemsrp.c:1328:

        memb_consensus_agreed: Assertion `token_memb_entries >= 1'

        failed.<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        attrd[9104]:    error: Connection to the CPG API failed: Library

        error (2)<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        stonith-ng[9100]:    error: Connection to the CPG API failed:

        Library error (2)<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        systemd[1]: corosync.service: Main process exited, code=dumped,

        status=6/ABRT<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        cib[9098]:    error: Connection to the CPG API failed: Library

        error (2)<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        systemd[1]: corosync.service: Failed with result 'core-dump'.<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        pacemakerd[9087]:    error: Connection to the CPG API failed:

        Library error (2)<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        systemd[1]: pacemaker.service: Main process exited, code=exited,

        status=107/n/a<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        systemd[1]: pacemaker.service: Failed with result 'exit-code'.<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        systemd[1]: Stopped Pacemaker High Availability Cluster Manager.<br>

        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        lrmd[9102]:  warning: new_event_notification (9102-9107-7): Bad

        file descriptor (9)<br>

        ...<br>

        ```<br>

        Pacemaker's log shows no relevant info.<br>

        <br>

        This is from corosync's log:<br>

        <br>

        ```<br>

        Jul 12 00:27:33 [9107] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

        crmd:     info: qb_ipcs_us_withdraw:    withdrawing server

        sockets<br>

        Jul 12 00:27:33 [9104] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>    

         attrd:    error: pcmk_cpg_dispatch:      Connection to the CPG

        API failed: Library error (2)<br>

        Jul 12 00:27:33 [9100] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        stonith-ng:    error: pcmk_cpg_dispatch:      Connection to the

        CPG API failed: Library error (2)<br>

        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

         cib:    error: pcmk_cpg_dispatch:      Connection to the CPG

        API failed: Library error (2)<br>

        Jul 12 00:27:33 [9087] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        pacemakerd:    error: pcmk_cpg_dispatch:      Connection to the

        CPG API failed: Library error (2)<br>

        Jul 12 00:27:33 [9104] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>    

         attrd:     info: qb_ipcs_us_withdraw:    withdrawing server

        sockets<br>

        Jul 12 00:27:33 [9087] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        pacemakerd:     info: crm_xml_cleanup:        Cleaning up memory

        from libxml2<br>

        Jul 12 00:27:33 [9107] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

        crmd:     info: crm_xml_cleanup:        Cleaning up memory from

        libxml2<br>

        Jul 12 00:27:33 [9100] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        stonith-ng:     info: qb_ipcs_us_withdraw:    withdrawing server

        sockets<br>

        Jul 12 00:27:33 [9104] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>    

         attrd:     info: crm_xml_cleanup:        Cleaning up memory

        from libxml2<br>

        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

         cib:     info: qb_ipcs_us_withdraw:    withdrawing server

        sockets<br>

        Jul 12 00:27:33 [9100] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>

        stonith-ng:     info: crm_xml_cleanup:        Cleaning up memory

        from libxml2<br>

        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

         cib:     info: qb_ipcs_us_withdraw:    withdrawing server

        sockets<br>

        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

         cib:     info: qb_ipcs_us_withdraw:    withdrawing server

        sockets<br>

        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

         cib:     info: crm_xml_cleanup:        Cleaning up memory from

        libxml2<br>

        Jul 12 00:27:33 [9102] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      

        lrmd:  warning: qb_ipcs_event_sendv:    new_event_notification

        (9102-9107-7): Bad file descriptor (9)<br>

        ```<br>

        <br>

        Please let me know if you need any further info, I'll be more

        than happy to provide it.<br>

        <br>

        This is always reproducible in our environment:<br>

        Ubuntu 18.04.2<br>

        corosync 2.4.3-0ubuntu1.1<br>

        pcs 0.9.164-1<br>

        <div>pacemaker 1.1.18-0ubuntu1.1</div>

        <div><br>

        </div>

        <div>Kind regards,</div>

        <div>Momo.<br>

        </div>

      </div>

      <br>

      <fieldset class="gmail-m_-5926390740668612657mimeAttachmentHeader"></fieldset>

      <pre class="gmail-m_-5926390740668612657moz-quote-pre">_______________________________________________

Manage your subscription:

<a class="gmail-m_-5926390740668612657moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="gmail-m_-5926390740668612657moz-txt-link-freetext" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a></pre>

    </blockquote>

    <br>

  </div>

</blockquote></div></div>