[ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton sb-clusterlabs at allafrica.com
Tue Apr 7 09:42:03 EDT 2020


i could not determine which of these sub-threads to include this in, so 
i am going to (reluctantly) top-post it.

i switched the transport to udp, and in limited testing i seem to not be 
hitting the race condition. of course i have no idea whether this will 
behave consistently, or which part of the knet vs udp setup makes the 
most difference.

ie, is it the overhead of the crypto handshakes/setup? is there some 
other knet layer that imparts additional delay in establishing 
connection to other nodes? is the delay on the rebooted node, the 
standing node, or both?

ultimate i have to remind myself that "a race condition is a race 
condition", and that you can't chase micro-second improvements that may 
lessen the chance of triggering it. you have to solve the underlying 
problem.


thanks again folks, for your help, and the great work you are doing.


On 4/7/20 4:09 AM, Jan Friesse wrote:
> Sherrard and Andrei
> 
>>
>>
>> On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
>>> 06.04.2020 20:57, Sherrard Burton пишет:
>>>>
>>>>
>>>> On 4/6/20 1:20 PM, Sherrard Burton wrote:
>>>>>
>>>>>
>>>>> On 4/6/20 12:35 PM, Andrei Borzenkov wrote:
>>>>>> 06.04.2020 17:05, Sherrard Burton пишет:
>>>>>>>
>>>>>>> from the quorum node:
>>>>> ...
>>>>>>> Apr 05 23:10:17 debug   Client ::ffff:192.168.250.50:54462 (cluster
>>>>>>> xen-nfs01_xen-nfs02, node_id 1) sent quorum node list.
>>>>>>> Apr 05 23:10:17 debug     msg seq num = 6
>>>>>>> Apr 05 23:10:17 debug     quorate = 0
>>>>>>> Apr 05 23:10:17 debug     node list:
>>>>>>> Apr 05 23:10:17 debug       node_id = 1, data_center_id = 0, 
>>>>>>> node_state
>>>>>>> = member
>>>>>>
>>>>>> Oops. How comes that node that was rebooted formed cluster all by
>>>>>> itself, without seeing the second node? Do you have two_nodes and/or
>>>>>> wait_for_all configured?
>>>>>>
>>>>
>>>> i never thought to check the logs on the rebooted server. hopefully
>>>> someone can extract some further useful information here:
>>>>
>>>>
>>>> https://pastebin.com/imnYKBMN
>>>>
>>>
>>> It looks like some timing issue or race condition. After reboot node
>>> manages to contact qnetd first, before connection to other node is
>>> established. Qnetd behaves as documented - it sees two equal size
>>> partitions and favors the partition that includes tie breaker (lowest
>>> node id). So existing node goes out of quorum. Second later both nodes
>>> see each other and so quorum is regained.
> 
> Nice catch
> 
>>
>>
>> thank you for taking the time to troll through my debugging output. 
>> your explanation seems to accurately describe what i am experiencing. 
>> of course i have no idea how to remedy it. :-)
> 
> It is really quite a problem. Honestly, I don't think there is really a 
> way how to remedy this behavior other than implement option to prefer 
> active partition as a tie-breaker 
> (https://github.com/corosync/corosync-qdevice/issues/7).
> 
> 
>>
>>>
>>> I cannot reproduce it, but I also do not use knet. From documentation I
>>> have impression that knet has artificial delay before it considers links
>>> operational, so may be that is the reason.
>>
>> i will do some reading on how knet factors into all of this and 
>> respond with any questions or discoveries.
> 
> knet_pong_count/knet_ping_interval tuning may help, but I don't think 
> there is really a way to prevent creation of single node membership in 
> all possible cases.
> 
>>
>>>
>>>>>
>>>>> BTW, great eyes. i had not picked up on that little nuance. i had
>>>>> poured through this particular log a number of times, but it was very
>>>>> hard for me to discern the starting and stopping points for each
>>>>> logical group of messages. the indentation made some of it clear. but
>>>>> when you have a series of lines beginning in the left-most column, it
>>>>> is not clear whether they belong to the previous group, the next
>>>>> group, or they are their own group.
>>>>>
>>>>> just wanted to note my confusion in case the relevant maintainer
>>>>> happens across this thread.
> 
> Here :)
> 
> Output (especially debug one) is really a bit cryptic, but I'm not 
> entirely sure how to make it better. Qnetd events have no strict 
> ordering so I don't see a way ho to group relevant events without some 
> kind of reordering and best guessing, what I'm not too keen to do. Also 
> some of the messages relates to specific nodes and some of the messages 
> relates to whole cluster (or part of the cluster).
> 
> Of course I'm open to ideas how to structure it better way.
> 
> Regards,
>    Honza
> 
> 
>>>>>
>>>>> thanks again
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
> 


More information about the Users mailing list