[ClusterLabs] temporary loss of quorum when member starts to rejoin

Thu Apr 9 06:43:57 EDT 2020

Sherrard Burton napsal(a):
> 
> 
> On 4/8/20 1:09 PM, Andrei Borzenkov wrote:
>> 08.04.2020 10:12, Jan Friesse пишет:
>>> Sherrard,
>>>
>>>> i could not determine which of these sub-threads to include this in,
>>>> so i am going to (reluctantly) top-post it.
>>>>
>>>> i switched the transport to udp, and in limited testing i seem to not
>>>> be hitting the race condition. of course i have no idea whether this
>>>> will behave consistently, or which part of the knet vs udp setup makes
>>>> the most difference.
>>>>
>>>> ie, is it the overhead of the crypto handshakes/setup? is there some
>>>> other knet layer that imparts additional delay in establishing
>>>> connection to other nodes? is the delay on the rebooted node, the
>>>> standing node, or both?
>>>>
>>>
>>> Very high level, what is happening in corosync when using udpu:
>>> - Corosync started and begins in gather state -> sends "multicast"
>>> (emulated by unicast to all expected members) message telling "I'm here
>>> and this is my view of live nodes").
>>> - In this state, corosync waits for answers
>>> - When node receives this message it "multicast" same message with
>>> updated view of live nodes
>>> - After all nodes agrees, they move to next state (commit/recovery and
>>> finally operational)
>>>
>>> With udp, this happens instantly so most of the time corosync doesn't
>>> even create single node membership, which would be created if no other
>>> nodes exists and/or replies wouldn't be delivered on time.
>>>
>>
>> Is it possible to delay "creating single node membership" until some
>> reasonable initial timeout after corosync starts to ensure node view of
>> cluster is up to date? It is clear that there will always be some corner
>> cases, but at least this would make "obviously correct" configuration to
>> behave as expected.
>>
>> Corosync already must have timeout to declare peers unreachable - it
>> sounds like most logical to use in this case.
>>
> 
> i tossed that idea around in my head as well. basically if there was an 
> analogue client_leaving called client_joining that could be used to 
> allowed the qdevice to return 'ask later'.

It is there.

> 
> i think the trade-off here is that you sacrifice some responsiveness in 
> your failover times, since (i'm guessing) the timeout for declaring 
> peers unreachable errors on the side of caution.
> 
> the other hairy bit is determining the difference between a new 
> (illegitimate) single-node membership, and the existing (legitimate) 
> single-node membership. both are equally legitimate from the standpoint 
> of each client, which can see the qdevice, but not the peer, and from 
> the standpoint of the qdevice, which can see both clients.

Yep. Actually this is really situation which I hadn't think about. It is 
quite special, because for more than 2 nodes, it works as it should 
(single node partition never gets a vote then). That doesn't mean 2 node 
cluster is not important - it's quite opposite - this is where qdevice 
makes sense.

> 
> as such, i suspect that this all comes right back to figuring out how to 
> implement issue #7.

It's not hard, it is just quite some work to do. I'm on it, but I have 
no ETA yet (and of course current situation in real life doesn't help 
too much). When I get something, I will let you know and be happy if you 
would be able to test it.

Regards,
   Honza

> 
> 
>>>
>>> Knet adds a layer which monitors links between each of the node and it
>>> will make line active after it received configured number of "pong"
>>> packets. Idea behind is to have evidence of reasonable stable line. As
>>> long as line is not active no data packet goes thru (corosync traffic is
>>> just "data"). This basically means, that initial corosync multicast is
>>> not delivered to other nodes so corosync creates single node membership.
>>> After line becomes active "multicast" is delivered to other nodes and
>>> they move to gather state.
>>>
>>
>> I would expect "reasonable timeout" to also take in account knet delay.
>>
>>> So to answer you question. "Delay" is on both nodes side because link is
>>> not established between the nodes.
>>>
>>
>> knet was expected to improve things, was not it? :)
>>
>