[ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton sb-clusterlabs at allafrica.com
Tue Apr 7 09:32:19 EDT 2020



On 4/7/20 8:40 AM, Jan Friesse wrote:
> Sherrard,
> 
> 
>>
>>
>> On 4/7/20 12:53 AM, Strahil Nikolov wrote:
>>>
>>> Hi Sherrard,
>>>
>>> Have you tried to increase the qnet timers in the corosync.conf ?
>>>
>>
>> Strahil,
>> i have actually reduced the qnet timers in order to improve failover 
>> response time, per Jan's suggestion on the thread '[ClusterLabs]  > 
>> reducing corosync-qnetd "response time"'
> 
> This is actually different problem and reduced qnetd and qdevice timers 
> will not help. This problem is really about 2 node cluster which is half 
> split into two single node memberships. Qnetd then gives vote to node 
> with lowest node id, in this case it is newly restarted node.
> 

Jan,
i bought into Strahil's question about increasing the timers, not 
because the timers are related to the tie-breaker, per-se, but because 
the race condition seems to be triggered by (but not caused by) the fact 
that qnetd is able to establish communication before knet.

ie, if the timing could be adjusted so that qnetd connects only after 
knet, then the rebooted node would be able to see the running node 
before contacting the qdevice.

of course, none of that would represent a real fix, and would actually 
introduce a different set of problems. i just wanted to clarify my 
interpretation of Strahil's question.


> Regards,
>    Honza
> 


More information about the Users mailing list