[ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton sb-clusterlabs at allafrica.com
Mon Apr 6 17:21:50 EDT 2020



On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
> 06.04.2020 20:57, Sherrard Burton пишет:
>>
>>
>> On 4/6/20 1:20 PM, Sherrard Burton wrote:
>>>
>>>
>>> On 4/6/20 12:35 PM, Andrei Borzenkov wrote:
>>>> 06.04.2020 17:05, Sherrard Burton пишет:
>>>>>
>>>>> from the quorum node:
>>> ...
>>>>> Apr 05 23:10:17 debug   Client ::ffff:192.168.250.50:54462 (cluster
>>>>> xen-nfs01_xen-nfs02, node_id 1) sent quorum node list.
>>>>> Apr 05 23:10:17 debug     msg seq num = 6
>>>>> Apr 05 23:10:17 debug     quorate = 0
>>>>> Apr 05 23:10:17 debug     node list:
>>>>> Apr 05 23:10:17 debug       node_id = 1, data_center_id = 0, node_state
>>>>> = member
>>>>
>>>> Oops. How comes that node that was rebooted formed cluster all by
>>>> itself, without seeing the second node? Do you have two_nodes and/or
>>>> wait_for_all configured?
>>>>
>>
>> i never thought to check the logs on the rebooted server. hopefully
>> someone can extract some further useful information here:
>>
>>
>> https://pastebin.com/imnYKBMN
>>
> 
> It looks like some timing issue or race condition. After reboot node
> manages to contact qnetd first, before connection to other node is
> established. Qnetd behaves as documented - it sees two equal size
> partitions and favors the partition that includes tie breaker (lowest
> node id). So existing node goes out of quorum. Second later both nodes
> see each other and so quorum is regained.


thank you for taking the time to troll through my debugging output. your 
explanation seems to accurately describe what i am experiencing. of 
course i have no idea how to remedy it. :-)

> 
> I cannot reproduce it, but I also do not use knet. From documentation I
> have impression that knet has artificial delay before it considers links
> operational, so may be that is the reason.

i will do some reading on how knet factors into all of this and respond 
with any questions or discoveries.

> 
>>>
>>> BTW, great eyes. i had not picked up on that little nuance. i had
>>> poured through this particular log a number of times, but it was very
>>> hard for me to discern the starting and stopping points for each
>>> logical group of messages. the indentation made some of it clear. but
>>> when you have a series of lines beginning in the left-most column, it
>>> is not clear whether they belong to the previous group, the next
>>> group, or they are their own group.
>>>
>>> just wanted to note my confusion in case the relevant maintainer
>>> happens across this thread.
>>>
>>> thanks again
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 


More information about the Users mailing list