<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger <<a href="mailto:kwenning@redhat.com">kwenning@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <div class="gmail-m_-5926390740668612657moz-cite-prefix">On 8/7/19 12:26 PM, Momcilo Medic
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr"> We have three node cluster that is setup to stop
        resources on lost quorum.<br>
        Failure (network going down) handling is done properly, but
        recovery doesn't seem to work.<br>
      </div>
    </blockquote>
    <tt>What do you mean by 'network going down'?</tt><tt><br>
    </tt><tt>Loss of link? Does the IP persist on the interface</tt><tt><br>
    </tt><tt>in that case?</tt><tt><br></tt></div></blockquote><div><br></div><div>Yes, we simulate faulty cable by turning switch ports down and up.<br>In such a case, the IP does not persist on the interface.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><tt>
    </tt><tt>That there are issue reconnecting the CPG-API</tt><tt><br>
    </tt><tt>sounds strange to me. Already the fact that</tt><tt><br>
    </tt><tt>something has to be reconnected. I got it</tt><tt><br>
    </tt><tt>that your nodes were persistently up during the</tt><tt><br>
    </tt><tt>network-disconnection. Although I would have</tt><tt><br>
    </tt><tt>expected fencing to kick in at least on those</tt><tt><br>
    </tt><tt>which are part of the non-quorate cluster-partition.</tt><tt><br>
    </tt><tt>Maybe a few words more on your scenario</tt><tt><br>
    </tt><tt>(fening-setup e.g.) would help to understand what</tt><tt><br>
    </tt><tt>is going on.</tt><tt><br></tt></div></blockquote><div><br></div><div>We don't use any fencing mechanisms, we rely on quorum to run the services.<br>In more detail, we run three node Linbit LINSTOR storage that is hyperconverged.<br>Meaning, we run clustered storage on the virtualization hypervisors.<br><br>We use pcs in order to have linstor-controller service in high availabilty mode.<br>Policy for no quorum is to stop the resources.<br><br>In such hyperconverged setup, we can't fence a node without impact.<br>It may happen that network instability causes primary node to no longer be primary.<br>In that case, we don't want running VMs to go down with the ship, as there was no impact for them.<br><br>However, we would like to have high-availability of that service upon network restoration, without manual actions.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><tt>
    </tt><tt><br>
    </tt><tt>Klaus</tt><br>
    <blockquote type="cite">
      <div dir="ltr"><br>
        What happens is, services crash when we re-enable network
        connection.<br>
        <br>
        From journal:<br>
        <br>
        ```<br>
        ...<br>
        Jul 12 00:27:32 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        corosync[9069]: corosync: totemsrp.c:1328:
        memb_consensus_agreed: Assertion `token_memb_entries >= 1'
        failed.<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        attrd[9104]:    error: Connection to the CPG API failed: Library
        error (2)<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        stonith-ng[9100]:    error: Connection to the CPG API failed:
        Library error (2)<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        systemd[1]: corosync.service: Main process exited, code=dumped,
        status=6/ABRT<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        cib[9098]:    error: Connection to the CPG API failed: Library
        error (2)<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        systemd[1]: corosync.service: Failed with result 'core-dump'.<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        pacemakerd[9087]:    error: Connection to the CPG API failed:
        Library error (2)<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        systemd[1]: pacemaker.service: Main process exited, code=exited,
        status=107/n/a<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        systemd[1]: pacemaker.service: Failed with result 'exit-code'.<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        systemd[1]: Stopped Pacemaker High Availability Cluster Manager.<br>
        Jul 12 00:27:33 <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        lrmd[9102]:  warning: new_event_notification (9102-9107-7): Bad
        file descriptor (9)<br>
        ...<br>
        ```<br>
        Pacemaker's log shows no relevant info.<br>
        <br>
        This is from corosync's log:<br>
        <br>
        ```<br>
        Jul 12 00:27:33 [9107] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
        crmd:     info: qb_ipcs_us_withdraw:    withdrawing server
        sockets<br>
        Jul 12 00:27:33 [9104] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>    
         attrd:    error: pcmk_cpg_dispatch:      Connection to the CPG
        API failed: Library error (2)<br>
        Jul 12 00:27:33 [9100] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        stonith-ng:    error: pcmk_cpg_dispatch:      Connection to the
        CPG API failed: Library error (2)<br>
        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
         cib:    error: pcmk_cpg_dispatch:      Connection to the CPG
        API failed: Library error (2)<br>
        Jul 12 00:27:33 [9087] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        pacemakerd:    error: pcmk_cpg_dispatch:      Connection to the
        CPG API failed: Library error (2)<br>
        Jul 12 00:27:33 [9104] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>    
         attrd:     info: qb_ipcs_us_withdraw:    withdrawing server
        sockets<br>
        Jul 12 00:27:33 [9087] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        pacemakerd:     info: crm_xml_cleanup:        Cleaning up memory
        from libxml2<br>
        Jul 12 00:27:33 [9107] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
        crmd:     info: crm_xml_cleanup:        Cleaning up memory from
        libxml2<br>
        Jul 12 00:27:33 [9100] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        stonith-ng:     info: qb_ipcs_us_withdraw:    withdrawing server
        sockets<br>
        Jul 12 00:27:33 [9104] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>    
         attrd:     info: crm_xml_cleanup:        Cleaning up memory
        from libxml2<br>
        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
         cib:     info: qb_ipcs_us_withdraw:    withdrawing server
        sockets<br>
        Jul 12 00:27:33 [9100] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>
        stonith-ng:     info: crm_xml_cleanup:        Cleaning up memory
        from libxml2<br>
        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
         cib:     info: qb_ipcs_us_withdraw:    withdrawing server
        sockets<br>
        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
         cib:     info: qb_ipcs_us_withdraw:    withdrawing server
        sockets<br>
        Jul 12 00:27:33 [9098] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
         cib:     info: crm_xml_cleanup:        Cleaning up memory from
        libxml2<br>
        Jul 12 00:27:33 [9102] <a href="http://itaftestkvmls02.dc.itaf.eu" target="_blank">itaftestkvmls02.dc.itaf.eu</a>      
        lrmd:  warning: qb_ipcs_event_sendv:    new_event_notification
        (9102-9107-7): Bad file descriptor (9)<br>
        ```<br>
        <br>
        Please let me know if you need any further info, I'll be more
        than happy to provide it.<br>
        <br>
        This is always reproducible in our environment:<br>
        Ubuntu 18.04.2<br>
        corosync 2.4.3-0ubuntu1.1<br>
        pcs 0.9.164-1<br>
        <div>pacemaker 1.1.18-0ubuntu1.1</div>
        <div><br>
        </div>
        <div>Kind regards,</div>
        <div>Momo.<br>
        </div>
      </div>
      <br>
      <fieldset class="gmail-m_-5926390740668612657mimeAttachmentHeader"></fieldset>
      <pre class="gmail-m_-5926390740668612657moz-quote-pre">_______________________________________________
Manage your subscription:
<a class="gmail-m_-5926390740668612657moz-txt-link-freetext" href="https://lists.clusterlabs.org/mailman/listinfo/users" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a>

ClusterLabs home: <a class="gmail-m_-5926390740668612657moz-txt-link-freetext" href="https://www.clusterlabs.org/" target="_blank">https://www.clusterlabs.org/</a></pre>
    </blockquote>
    <br>
  </div>

</blockquote></div></div>