[ClusterLabs] Restoring network connection breaks cluster services

Momcilo Medic medicmomcilo at gmail.com
Wed Aug 7 06:26:09 EDT 2019


 We have three node cluster that is setup to stop resources on lost quorum.
Failure (network going down) handling is done properly, but recovery
doesn't seem to work.

What happens is, services crash when we re-enable network connection.

>From journal:

```
...
Jul 12 00:27:32 itaftestkvmls02.dc.itaf.eu corosync[9069]: corosync:
totemsrp.c:1328: memb_consensus_agreed: Assertion `token_memb_entries >= 1'
failed.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu attrd[9104]:    error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu stonith-ng[9100]:    error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
Main process exited, code=dumped, status=6/ABRT
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu cib[9098]:    error: Connection
to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: corosync.service:
Failed with result 'core-dump'.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu pacemakerd[9087]:    error:
Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
Main process exited, code=exited, status=107/n/a
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: pacemaker.service:
Failed with result 'exit-code'.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu systemd[1]: Stopped Pacemaker
High Availability Cluster Manager.
Jul 12 00:27:33 itaftestkvmls02.dc.itaf.eu lrmd[9102]:  warning:
new_event_notification (9102-9107-7): Bad file descriptor (9)
...
```
Pacemaker's log shows no relevant info.

This is from corosync's log:

```
Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu       crmd:     info:
qb_ipcs_us_withdraw:    withdrawing server sockets
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu      attrd:    error:
pcmk_cpg_dispatch:      Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng:    error:
pcmk_cpg_dispatch:      Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu        cib:    error:
pcmk_cpg_dispatch:      Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd:    error:
pcmk_cpg_dispatch:      Connection to the CPG API failed: Library error (2)
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu      attrd:     info:
qb_ipcs_us_withdraw:    withdrawing server sockets
Jul 12 00:27:33 [9087] itaftestkvmls02.dc.itaf.eu pacemakerd:     info:
crm_xml_cleanup:        Cleaning up memory from libxml2
Jul 12 00:27:33 [9107] itaftestkvmls02.dc.itaf.eu       crmd:     info:
crm_xml_cleanup:        Cleaning up memory from libxml2
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng:     info:
qb_ipcs_us_withdraw:    withdrawing server sockets
Jul 12 00:27:33 [9104] itaftestkvmls02.dc.itaf.eu      attrd:     info:
crm_xml_cleanup:        Cleaning up memory from libxml2
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu        cib:     info:
qb_ipcs_us_withdraw:    withdrawing server sockets
Jul 12 00:27:33 [9100] itaftestkvmls02.dc.itaf.eu stonith-ng:     info:
crm_xml_cleanup:        Cleaning up memory from libxml2
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu        cib:     info:
qb_ipcs_us_withdraw:    withdrawing server sockets
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu        cib:     info:
qb_ipcs_us_withdraw:    withdrawing server sockets
Jul 12 00:27:33 [9098] itaftestkvmls02.dc.itaf.eu        cib:     info:
crm_xml_cleanup:        Cleaning up memory from libxml2
Jul 12 00:27:33 [9102] itaftestkvmls02.dc.itaf.eu       lrmd:  warning:
qb_ipcs_event_sendv:    new_event_notification (9102-9107-7): Bad file
descriptor (9)
```

Please let me know if you need any further info, I'll be more than happy to
provide it.

This is always reproducible in our environment:
Ubuntu 18.04.2
corosync 2.4.3-0ubuntu1.1
pcs 0.9.164-1
pacemaker 1.1.18-0ubuntu1.1

Kind regards,
Momo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190807/c6ef799a/attachment-0001.html>


More information about the Users mailing list