[ClusterLabs] temporary loss of quorum when member starts to rejoin

Thu Apr 9 06:31:37 EDT 2020

Andrei Borzenkov napsal(a):
> 08.04.2020 10:12, Jan Friesse пишет:
>> Sherrard,
>>
>>> i could not determine which of these sub-threads to include this in,
>>> so i am going to (reluctantly) top-post it.
>>>
>>> i switched the transport to udp, and in limited testing i seem to not
>>> be hitting the race condition. of course i have no idea whether this
>>> will behave consistently, or which part of the knet vs udp setup makes
>>> the most difference.
>>>
>>> ie, is it the overhead of the crypto handshakes/setup? is there some
>>> other knet layer that imparts additional delay in establishing
>>> connection to other nodes? is the delay on the rebooted node, the
>>> standing node, or both?
>>>
>>
>> Very high level, what is happening in corosync when using udpu:
>> - Corosync started and begins in gather state -> sends "multicast"
>> (emulated by unicast to all expected members) message telling "I'm here
>> and this is my view of live nodes").
>> - In this state, corosync waits for answers
>> - When node receives this message it "multicast" same message with
>> updated view of live nodes
>> - After all nodes agrees, they move to next state (commit/recovery and
>> finally operational)
>>
>> With udp, this happens instantly so most of the time corosync doesn't
>> even create single node membership, which would be created if no other
>> nodes exists and/or replies wouldn't be delivered on time.
>>
> 
> Is it possible to delay "creating single node membership" until some
> reasonable initial timeout after corosync starts to ensure node view of

The thing is, totemsrp begins by creating single node membership. It has 
to start somewhere. Of course question is, if it would make sense to 
slow a bit on the startup to create "better" membership? I would say so, 
and it is something I'm considering as TODO.

> cluster is up to date? It is clear that there will always be some corner
> cases, but at least this would make "obviously correct" configuration to
> behave as expected.
> 
> Corosync already must have timeout to declare peers unreachable - it
> sounds like most logical to use in this case.

It does, join timeout, but enlarging it will generally slow failure 
detection/recovery.

> 
>>
>> Knet adds a layer which monitors links between each of the node and it
>> will make line active after it received configured number of "pong"
>> packets. Idea behind is to have evidence of reasonable stable line. As
>> long as line is not active no data packet goes thru (corosync traffic is
>> just "data"). This basically means, that initial corosync multicast is
>> not delivered to other nodes so corosync creates single node membership.
>> After line becomes active "multicast" is delivered to other nodes and
>> they move to gather state.
>>
> 
> I would expect "reasonable timeout" to also take in account knet delay.
> 
>> So to answer you question. "Delay" is on both nodes side because link is
>> not established between the nodes.
>>
> 
> knet was expected to improve things, was not it? :)
> 

And I believe it does :) Actually, it now behaves more "correctly" (read 
as "as specification says") than before. Anyway, I got the point, it's 
in TODO (https://github.com/corosync/corosync/issues/549)

Honza