[ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton sb-clusterlabs at allafrica.com
Tue Apr 7 14:13:35 EDT 2020



On 4/7/20 1:16 PM, Andrei Borzenkov wrote:
> 07.04.2020 00:21, Sherrard Burton пишет:
>>>
>>> It looks like some timing issue or race condition. After reboot node
>>> manages to contact qnetd first, before connection to other node is
>>> established. Qnetd behaves as documented - it sees two equal size
>>> partitions and favors the partition that includes tie breaker (lowest
>>> node id). So existing node goes out of quorum. Second later both nodes
>>> see each other and so quorum is regained.
>>
> 
> Define the right problem to solve?
> 
> Educated guess is that your problem is not corosync but pacemaker
> stopping resources. In this case just do what was done for years in two
> node cluster - set no-quorum-policy=ignore and rely on stonith to
> resolve split brain.
> 
> I dropped idea to use qdevice in two node cluster. If you have reliable
> stonith device it is not needed and without stonith relying on watchdog
> suicide has too many problems.
> 

Andrei,
in a two-node cluster with stonith only, but no qdevice, how do you 
avoid the dreaded stonith death match, and the resultant flip-flopping 
of services?

and are you using this configuration with stateful services? my main use 
case is DRBD, so i am very cautious of making sure that there is no data 
corruption, or disruption. so the qdevice is a part of my "belt and 
suspenders" approach.


More information about the Users mailing list