[ClusterLabs] temporary loss of quorum when member starts to rejoin

Andrei Borzenkov arvidjaar at gmail.com
Tue Apr 7 13:16:19 EDT 2020


07.04.2020 00:21, Sherrard Burton пишет:
>>
>> It looks like some timing issue or race condition. After reboot node
>> manages to contact qnetd first, before connection to other node is
>> established. Qnetd behaves as documented - it sees two equal size
>> partitions and favors the partition that includes tie breaker (lowest
>> node id). So existing node goes out of quorum. Second later both nodes
>> see each other and so quorum is regained.
> 

Define the right problem to solve?

Educated guess is that your problem is not corosync but pacemaker
stopping resources. In this case just do what was done for years in two
node cluster - set no-quorum-policy=ignore and rely on stonith to
resolve split brain.

I dropped idea to use qdevice in two node cluster. If you have reliable
stonith device it is not needed and without stonith relying on watchdog
suicide has too many problems.


More information about the Users mailing list