[ClusterLabs] temporary loss of quorum when member starts to rejoin
Sherrard Burton
sb-clusterlabs at allafrica.com
Tue Apr 7 09:21:50 EDT 2020
On 4/7/20 4:09 AM, Jan Friesse wrote:
> Sherrard and Andrei
>
>>
>>
>> On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
>>> 06.04.2020 20:57, Sherrard Burton пишет:
>>>>
>>>>
>>>> On 4/6/20 1:20 PM, Sherrard Burton wrote:
>>>>>
>>>>>
>>>>> On 4/6/20 12:35 PM, Andrei Borzenkov wrote:
>>>>>> 06.04.2020 17:05, Sherrard Burton пишет:
>>>>>>>
>>>>>>> from the quorum node:
>>>>> ...
>>>>>>> Apr 05 23:10:17 debug Client ::ffff:192.168.250.50:54462 (cluster
>>>>>>> xen-nfs01_xen-nfs02, node_id 1) sent quorum node list.
>>>>>>> Apr 05 23:10:17 debug msg seq num = 6
>>>>>>> Apr 05 23:10:17 debug quorate = 0
>>>>>>> Apr 05 23:10:17 debug node list:
>>>>>>> Apr 05 23:10:17 debug node_id = 1, data_center_id = 0,
>>>>>>> node_state
>>>>>>> = member
>>>>>>
>>>>>> Oops. How comes that node that was rebooted formed cluster all by
>>>>>> itself, without seeing the second node? Do you have two_nodes and/or
>>>>>> wait_for_all configured?
>>>>>>
>>>>
>>>> i never thought to check the logs on the rebooted server. hopefully
>>>> someone can extract some further useful information here:
>>>>
>>>>
>>>> https://pastebin.com/imnYKBMN
>>>>
>>>
>>> It looks like some timing issue or race condition. After reboot node
>>> manages to contact qnetd first, before connection to other node is
>>> established. Qnetd behaves as documented - it sees two equal size
>>> partitions and favors the partition that includes tie breaker (lowest
>>> node id). So existing node goes out of quorum. Second later both nodes
>>> see each other and so quorum is regained.
>
> Nice catch
>
>>
>>
>> thank you for taking the time to troll through my debugging output.
>> your explanation seems to accurately describe what i am experiencing.
>> of course i have no idea how to remedy it. :-)
>
> It is really quite a problem. Honestly, I don't think there is really a
> way how to remedy this behavior other than implement option to prefer
> active partition as a tie-breaker
> (https://github.com/corosync/corosync-qdevice/issues/7).
>
>
>>
>>>
>>> I cannot reproduce it, but I also do not use knet. From documentation I
>>> have impression that knet has artificial delay before it considers links
>>> operational, so may be that is the reason.
>>
>> i will do some reading on how knet factors into all of this and
>> respond with any questions or discoveries.
>
> knet_pong_count/knet_ping_interval tuning may help, but I don't think
> there is really a way to prevent creation of single node membership in
> all possible cases.
yes. in my limited thinking about it, i keep coming back around to that
conclusion in the two-node + qdevice case, barring implementation of #7.
>
>>
>>>
>>>>>
>>>>> BTW, great eyes. i had not picked up on that little nuance. i had
>>>>> poured through this particular log a number of times, but it was very
>>>>> hard for me to discern the starting and stopping points for each
>>>>> logical group of messages. the indentation made some of it clear. but
>>>>> when you have a series of lines beginning in the left-most column, it
>>>>> is not clear whether they belong to the previous group, the next
>>>>> group, or they are their own group.
>>>>>
>>>>> just wanted to note my confusion in case the relevant maintainer
>>>>> happens across this thread.
>
> Here :)
>
> Output (especially debug one) is really a bit cryptic, but I'm not
> entirely sure how to make it better. Qnetd events have no strict
> ordering so I don't see a way ho to group relevant events without some
> kind of reordering and best guessing, what I'm not too keen to do. Also
> some of the messages relates to specific nodes and some of the messages
> relates to whole cluster (or part of the cluster).
>
> Of course I'm open to ideas how to structure it better way.
i wish i was well-versed enough in this particular codebase to submit a
PR. i think that some kind of tagging indicating whether messages are
node-specific or cluster-specific would probably help a bit. but
ultimately it is probably not worth the effort of changing the code, as
long as the relevant parties can easily analyze the output.
>
> Regards,
> Honza
More information about the Users
mailing list