[ClusterLabs] temporary loss of quorum when member starts to rejoin

Wed Apr 8 02:58:30 EDT 2020

Sherrard,

> 
> 
> On 4/7/20 4:09 AM, Jan Friesse wrote:
>> Sherrard and Andrei
>>
>>>
>>>
>>> On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
>>>> 06.04.2020 20:57, Sherrard Burton пишет:
>>>>>
>>>>>
>>>>> On 4/6/20 1:20 PM, Sherrard Burton wrote:
>>>>>>
>>>>>>
>>>>>> On 4/6/20 12:35 PM, Andrei Borzenkov wrote:
>>>>>>> 06.04.2020 17:05, Sherrard Burton пишет:
>>>>>>>>
>>>>>>>> from the quorum node:
>>>>>> ...
>>>>>>>> Apr 05 23:10:17 debug   Client ::ffff:192.168.250.50:54462 (cluster
>>>>>>>> xen-nfs01_xen-nfs02, node_id 1) sent quorum node list.
>>>>>>>> Apr 05 23:10:17 debug     msg seq num = 6
>>>>>>>> Apr 05 23:10:17 debug     quorate = 0
>>>>>>>> Apr 05 23:10:17 debug     node list:
>>>>>>>> Apr 05 23:10:17 debug       node_id = 1, data_center_id = 0, 
>>>>>>>> node_state
>>>>>>>> = member
>>>>>>>
>>>>>>> Oops. How comes that node that was rebooted formed cluster all by
>>>>>>> itself, without seeing the second node? Do you have two_nodes and/or
>>>>>>> wait_for_all configured?
>>>>>>>
>>>>>
>>>>> i never thought to check the logs on the rebooted server. hopefully
>>>>> someone can extract some further useful information here:
>>>>>
>>>>>
>>>>> https://pastebin.com/imnYKBMN
>>>>>
>>>>
>>>> It looks like some timing issue or race condition. After reboot node
>>>> manages to contact qnetd first, before connection to other node is
>>>> established. Qnetd behaves as documented - it sees two equal size
>>>> partitions and favors the partition that includes tie breaker (lowest
>>>> node id). So existing node goes out of quorum. Second later both nodes
>>>> see each other and so quorum is regained.
>>
>> Nice catch
>>
>>>
>>>
>>> thank you for taking the time to troll through my debugging output. 
>>> your explanation seems to accurately describe what i am experiencing. 
>>> of course i have no idea how to remedy it. :-)
>>
>> It is really quite a problem. Honestly, I don't think there is really 
>> a way how to remedy this behavior other than implement option to 
>> prefer active partition as a tie-breaker 
>> (https://github.com/corosync/corosync-qdevice/issues/7).
>>
>>
>>>
>>>>
>>>> I cannot reproduce it, but I also do not use knet. From documentation I
>>>> have impression that knet has artificial delay before it considers 
>>>> links
>>>> operational, so may be that is the reason.
>>>
>>> i will do some reading on how knet factors into all of this and 
>>> respond with any questions or discoveries.
>>
>> knet_pong_count/knet_ping_interval tuning may help, but I don't think 
>> there is really a way to prevent creation of single node membership in 
>> all possible cases.
> 
> yes. in my limited thinking about it, i keep coming back around to that 
> conclusion in the two-node + qdevice case, barring implementation of #7.
> 
> 
>>
>>>
>>>>
>>>>>>
>>>>>> BTW, great eyes. i had not picked up on that little nuance. i had
>>>>>> poured through this particular log a number of times, but it was very
>>>>>> hard for me to discern the starting and stopping points for each
>>>>>> logical group of messages. the indentation made some of it clear. but
>>>>>> when you have a series of lines beginning in the left-most column, it
>>>>>> is not clear whether they belong to the previous group, the next
>>>>>> group, or they are their own group.
>>>>>>
>>>>>> just wanted to note my confusion in case the relevant maintainer
>>>>>> happens across this thread.
>>
>> Here :)
>>
>> Output (especially debug one) is really a bit cryptic, but I'm not 
>> entirely sure how to make it better. Qnetd events have no strict 
>> ordering so I don't see a way ho to group relevant events without some 
>> kind of reordering and best guessing, what I'm not too keen to do. 
>> Also some of the messages relates to specific nodes and some of the 
>> messages relates to whole cluster (or part of the cluster).
>>
>> Of course I'm open to ideas how to structure it better way.
> 
> i wish i was well-versed enough in this particular codebase to submit a 
> PR. i think that some kind of tagging indicating whether messages are 

Oh, PR is not really needed. For me it would be enough to see example of 
better structured log.

Honza

> node-specific or cluster-specific would probably help a bit. but 
> ultimately it is probably not worth the effort of changing the code, as 
> long as the relevant parties can easily analyze the output.
> 
>>
>> Regards,
>>    Honza
>