[ClusterLabs] Node reset on shutdown by SBD watchdog with corosync-qdevice

Andrei Borzenkov arvidjaar at gmail.com
Sun Jul 28 12:35:29 EDT 2019


In two node cluster + qnetd I consistently see the node that is being
shut down last being reset during shutdown. I.e.

- shutdown the first node - OK
- shutdown the second node - reset

As far as I understand what happens is

- during shutdown pacemaker.service is stopped first. In above
configuration it leaves corosync.service, corosync-qdevice.service and
sbd.service running (see another mail with subject "corosync.service
(and sbd.service) are not stopper on pacemaker shutdown when
corosync-qdevice is used")

- corosync-qdevice.service is declared After=corosync.service, so on
shutdown it is stopped first

- this immediately removes one vote from quorum

- when first node is shut down, node remains in quorum (it lost qnetd
but still has second node)

- when second node is shut down, as soon as corosync-qdevice.service
stops, node goes out-of-quorum and SBD resets it

Is it possible to start corosync-qdevice.service *before* corosync? Can
it be made intelligent enough to wait for corosync to come up?

This basically makes it impossible to safely shutdown cluster nodes.


More information about the Users mailing list