[ClusterLabs] temporary loss of quorum when member starts to rejoin
Jan Friesse
jfriesse at redhat.com
Thu Apr 9 06:37:09 EDT 2020
Sherrard Burton napsal(a):
>
>
> On 4/7/20 4:09 AM, Jan Friesse wrote:
>> Sherrard and Andrei
>>>
>>>
>>> On 4/6/20 4:10 PM, Andrei Borzenkov wrote:
>>>> 06.04.2020 20:57, Sherrard Burton пишет:
>>>>
>>>> It looks like some timing issue or race condition. After reboot node
>>>> manages to contact qnetd first, before connection to other node is
>>>> established. Qnetd behaves as documented - it sees two equal size
>>>> partitions and favors the partition that includes tie breaker (lowest
>>>> node id). So existing node goes out of quorum. Second later both nodes
>>>> see each other and so quorum is regained.
>>
>> Nice catch
>>
>>>
>>>
>>> thank you for taking the time to troll through my debugging output.
>>> your explanation seems to accurately describe what i am experiencing.
>>> of course i have no idea how to remedy it. :-)
>>
>> It is really quite a problem. Honestly, I don't think there is really
>> a way how to remedy this behavior other than implement option to
>> prefer active partition as a tie-breaker
>> (https://github.com/corosync/corosync-qdevice/issues/7).
>
> Jan,
> my curiosity got the best of me, so i spent some time trying to orient
> myself to the inner workings of ffsplit.
>
> a) how would one identify the current active partition? i might be
This is not tracked yet. This is first thing to do - add last known
membership into the qnetd cluster structure.
> starting too far in (or missing something), but it seems that by the
> time we are in qnetd_algo_ffsplit_partition_cmp(), we are comparing two
> sets of clients and node lists without the kind of context that would
> allow us to identify the current active partition. i could not easily
> identify the object that we would interrogate to answer that question.
>
> b) is it possible to manage client->tie_breaker.mode and
> client->tie_breaker.node_id dynamically to achieve the desired goal? ie,
> if we are in a two-node cluster and one node leaves, can we "push"
> values to the remaining client such that client->tie_breaker.mode ==
> TLV_TIE_BREAKER_MODE_NODE_ID and client->tie_breaker.node_id ==
> client->node_id?
Would be possible and it would probably work quite well for 2 node
cluster. But imagine cluster with more nodes - this is where things
become interesting.
Regards,
Honza
>
> of course i may be way off base with all of this. just wanted to ask
> before i extracted myself from the rabbit hole.
>
More information about the Users
mailing list