[ClusterLabs] temporary loss of quorum when member starts to rejoin
Strahil Nikolov
hunter86_bg at yahoo.com
Thu Apr 9 01:14:30 EDT 2020
On April 8, 2020 8:32:59 PM GMT+03:00, Sherrard Burton <sb-clusterlabs at allafrica.com> wrote:
>
>
>On 4/8/20 1:09 PM, Andrei Borzenkov wrote:
>> 08.04.2020 10:12, Jan Friesse пишет:
>>> Sherrard,
>>>
>>>> i could not determine which of these sub-threads to include this
>in,
>>>> so i am going to (reluctantly) top-post it.
>>>>
>>>> i switched the transport to udp, and in limited testing i seem to
>not
>>>> be hitting the race condition. of course i have no idea whether
>this
>>>> will behave consistently, or which part of the knet vs udp setup
>makes
>>>> the most difference.
>>>>
>>>> ie, is it the overhead of the crypto handshakes/setup? is there
>some
>>>> other knet layer that imparts additional delay in establishing
>>>> connection to other nodes? is the delay on the rebooted node, the
>>>> standing node, or both?
>>>>
>>>
>>> Very high level, what is happening in corosync when using udpu:
>>> - Corosync started and begins in gather state -> sends "multicast"
>>> (emulated by unicast to all expected members) message telling "I'm
>here
>>> and this is my view of live nodes").
>>> - In this state, corosync waits for answers
>>> - When node receives this message it "multicast" same message with
>>> updated view of live nodes
>>> - After all nodes agrees, they move to next state (commit/recovery
>and
>>> finally operational)
>>>
>>> With udp, this happens instantly so most of the time corosync
>doesn't
>>> even create single node membership, which would be created if no
>other
>>> nodes exists and/or replies wouldn't be delivered on time.
>>>
>>
>> Is it possible to delay "creating single node membership" until some
>> reasonable initial timeout after corosync starts to ensure node view
>of
>> cluster is up to date? It is clear that there will always be some
>corner
>> cases, but at least this would make "obviously correct" configuration
>to
>> behave as expected.
>>
>> Corosync already must have timeout to declare peers unreachable - it
>> sounds like most logical to use in this case.
>>
>
>i tossed that idea around in my head as well. basically if there was an
>
>analogue client_leaving called client_joining that could be used to
>allowed the qdevice to return 'ask later'.
>
>i think the trade-off here is that you sacrifice some responsiveness in
>
>your failover times, since (i'm guessing) the timeout for declaring
>peers unreachable errors on the side of caution.
>
>the other hairy bit is determining the difference between a new
>(illegitimate) single-node membership, and the existing (legitimate)
>single-node membership. both are equally legitimate from the standpoint
>
>of each client, which can see the qdevice, but not the peer, and from
>the standpoint of the qdevice, which can see both clients.
>
>as such, i suspect that this all comes right back to figuring out how
>to
>implement issue #7.
>
>
>>>
>>> Knet adds a layer which monitors links between each of the node and
>it
>>> will make line active after it received configured number of "pong"
>>> packets. Idea behind is to have evidence of reasonable stable line.
>As
>>> long as line is not active no data packet goes thru (corosync
>traffic is
>>> just "data"). This basically means, that initial corosync multicast
>is
>>> not delivered to other nodes so corosync creates single node
>membership.
>>> After line becomes active "multicast" is delivered to other nodes
>and
>>> they move to gather state.
>>>
>>
>> I would expect "reasonable timeout" to also take in account knet
>delay.
>>
>>> So to answer you question. "Delay" is on both nodes side because
>link is
>>> not established between the nodes.
>>>
>>
>> knet was expected to improve things, was not it? :)
>>
>_______________________________________________
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
I would have increased the consensus with several seconds.
Best Regards,
Strahil Nikolov
More information about the Users
mailing list