[ClusterLabs] Node reset on shutdown by SBD watchdog with corosync-qdevice
Andrei Borzenkov
arvidjaar at gmail.com
Sun Jul 28 12:35:29 EDT 2019
In two node cluster + qnetd I consistently see the node that is being
shut down last being reset during shutdown. I.e.
- shutdown the first node - OK
- shutdown the second node - reset
As far as I understand what happens is
- during shutdown pacemaker.service is stopped first. In above
configuration it leaves corosync.service, corosync-qdevice.service and
sbd.service running (see another mail with subject "corosync.service
(and sbd.service) are not stopper on pacemaker shutdown when
corosync-qdevice is used")
- corosync-qdevice.service is declared After=corosync.service, so on
shutdown it is stopped first
- this immediately removes one vote from quorum
- when first node is shut down, node remains in quorum (it lost qnetd
but still has second node)
- when second node is shut down, as soon as corosync-qdevice.service
stops, node goes out-of-quorum and SBD resets it
Is it possible to start corosync-qdevice.service *before* corosync? Can
it be made intelligent enough to wait for corosync to come up?
This basically makes it impossible to safely shutdown cluster nodes.
More information about the Users
mailing list