[ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton sb-clusterlabs at allafrica.com
Wed Apr 8 13:32:59 EDT 2020

On 4/8/20 1:09 PM, Andrei Borzenkov wrote:
> 08.04.2020 10:12, Jan Friesse пишет:
>> Sherrard,
>>> i could not determine which of these sub-threads to include this in,
>>> so i am going to (reluctantly) top-post it.
>>> i switched the transport to udp, and in limited testing i seem to not
>>> be hitting the race condition. of course i have no idea whether this
>>> will behave consistently, or which part of the knet vs udp setup makes
>>> the most difference.
>>> ie, is it the overhead of the crypto handshakes/setup? is there some
>>> other knet layer that imparts additional delay in establishing
>>> connection to other nodes? is the delay on the rebooted node, the
>>> standing node, or both?
>> Very high level, what is happening in corosync when using udpu:
>> - Corosync started and begins in gather state -> sends "multicast"
>> (emulated by unicast to all expected members) message telling "I'm here
>> and this is my view of live nodes").
>> - In this state, corosync waits for answers
>> - When node receives this message it "multicast" same message with
>> updated view of live nodes
>> - After all nodes agrees, they move to next state (commit/recovery and
>> finally operational)
>> With udp, this happens instantly so most of the time corosync doesn't
>> even create single node membership, which would be created if no other
>> nodes exists and/or replies wouldn't be delivered on time.
> Is it possible to delay "creating single node membership" until some
> reasonable initial timeout after corosync starts to ensure node view of
> cluster is up to date? It is clear that there will always be some corner
> cases, but at least this would make "obviously correct" configuration to
> behave as expected.
> Corosync already must have timeout to declare peers unreachable - it
> sounds like most logical to use in this case.

i tossed that idea around in my head as well. basically if there was an 
analogue client_leaving called client_joining that could be used to 
allowed the qdevice to return 'ask later'.

i think the trade-off here is that you sacrifice some responsiveness in 
your failover times, since (i'm guessing) the timeout for declaring 
peers unreachable errors on the side of caution.

the other hairy bit is determining the difference between a new 
(illegitimate) single-node membership, and the existing (legitimate) 
single-node membership. both are equally legitimate from the standpoint 
of each client, which can see the qdevice, but not the peer, and from 
the standpoint of the qdevice, which can see both clients.

as such, i suspect that this all comes right back to figuring out how to 
implement issue #7.

>> Knet adds a layer which monitors links between each of the node and it
>> will make line active after it received configured number of "pong"
>> packets. Idea behind is to have evidence of reasonable stable line. As
>> long as line is not active no data packet goes thru (corosync traffic is
>> just "data"). This basically means, that initial corosync multicast is
>> not delivered to other nodes so corosync creates single node membership.
>> After line becomes active "multicast" is delivered to other nodes and
>> they move to gather state.
> I would expect "reasonable timeout" to also take in account knet delay.
>> So to answer you question. "Delay" is on both nodes side because link is
>> not established between the nodes.
> knet was expected to improve things, was not it? :)

More information about the Users mailing list