[ClusterLabs] Restoring network connection breaks cluster services

Klaus Wenninger kwenning at redhat.com
Wed Aug 7 07:00:55 EDT 2019


On 8/7/19 12:26 PM, Momcilo Medic wrote:
> We have three node cluster that is setup to stop resources on lost quorum.
> Failure (network going down) handling is done properly, but recovery
> doesn't seem to work.
What do you mean by 'network going down'?
Loss of link? Does the IP persist on the interface
in that case?
That there are issue reconnecting the CPG-API
sounds strange to me. Already the fact that
something has to be reconnected. I got it
that your nodes were persistently up during the
network-disconnection. Although I would have
expected fencing to kick in at least on those
which are part of the non-quorate cluster-partition.
Maybe a few words more on your scenario
(fening-setup e.g.) would help to understand what
is going on.

Klaus
>
> What happens is, services crash when we re-enable network connection.
>
> From journal:
>
> ```
> ...
> Jul 12 00:27:32 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> corosync[9069]: corosync:
> totemsrp.c:1328: memb_consensus_agreed: Assertion `token_memb_entries
> >= 1' failed.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> attrd[9104]:    error: Connection
> to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> stonith-ng[9100]:    error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> systemd[1]: corosync.service: Main
> process exited, code=dumped, status=6/ABRT
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> cib[9098]:    error: Connection to
> the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> systemd[1]: corosync.service:
> Failed with result 'core-dump'.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> pacemakerd[9087]:    error:
> Connection to the CPG API failed: Library error (2)
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> systemd[1]: pacemaker.service:
> Main process exited, code=exited, status=107/n/a
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> systemd[1]: pacemaker.service:
> Failed with result 'exit-code'.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> systemd[1]: Stopped Pacemaker High
> Availability Cluster Manager.
> Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> lrmd[9102]:  warning:
> new_event_notification (9102-9107-7): Bad file descriptor (9)
> ...
> ```
> Pacemaker's log shows no relevant info.
>
> This is from corosync's log:
>
> ```
> Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>       crmd:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>      attrd:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> stonith-ng:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>        cib:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> pacemakerd:    error:
> pcmk_cpg_dispatch:      Connection to the CPG API failed: Library
> error (2)
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>      attrd:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> pacemakerd:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>       crmd:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> stonith-ng:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>      attrd:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>        cib:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu> stonith-ng:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>        cib:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>        cib:     info:
> qb_ipcs_us_withdraw:    withdrawing server sockets
> Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>        cib:     info:
> crm_xml_cleanup:        Cleaning up memory from libxml2
> Jul 12 00:27:33 [9102] itaftestkvmls02.dc.itaf.eu
> <http://itaftestkvmls02.dc.itaf.eu>       lrmd:  warning:
> qb_ipcs_event_sendv:    new_event_notification (9102-9107-7): Bad file
> descriptor (9)
> ```
>
> Please let me know if you need any further info, I'll be more than
> happy to provide it.
>
> This is always reproducible in our environment:
> Ubuntu 18.04.2
> corosync 2.4.3-0ubuntu1.1
> pcs 0.9.164-1
> pacemaker 1.1.18-0ubuntu1.1
>
> Kind regards,
> Momo.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190807/f7b16e6c/attachment.html>


More information about the Users mailing list