[ClusterLabs] temporary loss of quorum when member starts to rejoin
Sherrard Burton
sb-clusterlabs at allafrica.com
Tue Apr 7 09:32:19 EDT 2020
On 4/7/20 8:40 AM, Jan Friesse wrote:
> Sherrard,
>
>
>>
>>
>> On 4/7/20 12:53 AM, Strahil Nikolov wrote:
>>>
>>> Hi Sherrard,
>>>
>>> Have you tried to increase the qnet timers in the corosync.conf ?
>>>
>>
>> Strahil,
>> i have actually reduced the qnet timers in order to improve failover
>> response time, per Jan's suggestion on the thread '[ClusterLabs] >
>> reducing corosync-qnetd "response time"'
>
> This is actually different problem and reduced qnetd and qdevice timers
> will not help. This problem is really about 2 node cluster which is half
> split into two single node memberships. Qnetd then gives vote to node
> with lowest node id, in this case it is newly restarted node.
>
Jan,
i bought into Strahil's question about increasing the timers, not
because the timers are related to the tie-breaker, per-se, but because
the race condition seems to be triggered by (but not caused by) the fact
that qnetd is able to establish communication before knet.
ie, if the timing could be adjusted so that qnetd connects only after
knet, then the rebooted node would be able to see the running node
before contacting the qdevice.
of course, none of that would represent a real fix, and would actually
introduce a different set of problems. i just wanted to clarify my
interpretation of Strahil's question.
> Regards,
> Honza
>
More information about the Users
mailing list