[ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

Andrei Borzenkov arvidjaar at gmail.com
Mon Mar 1 08:09:00 EST 2021


On 01.03.2021 15:45, Jan Friesse wrote:
> Andrei,
> 
>> On 01.03.2021 12:26, Jan Friesse wrote:
>>>>
>>>
>>> Thanks for digging into logs. I believe Eric is hitting
>>> https://github.com/corosync/corosync-qdevice/issues/10 (already fixed,
>>> but may take some time to get into distributions) - it also contains
>>> workaround.
>>>
>>
>> I tested corosync-qnetd at df3c672 which should include these fixes. It
>> changed behavior, still I cannot explain it.
>>
>> Again, ha1+ha2+qnetd, ha2 is current DC, I disconnect ha1 (block
>> everything with ha1 source MAC), stonith disabled. corosync and
> 
> So ha1 is blocked on both ha2 and qnetd and blocking is symmetric (I
> mean, nothing is sent to ha1 and nothing is received from ha1)?
> 

No, it is asymmetric. ha1 cannot *send* anything to ha2 or qnetd; it
should be able to *receive* from both.

>> corosync-qdevice on nodes are still 2.4.5 if it matters.
> 
> Shouldn't really matter as long as both corosync-qdevice and
> corosync-qnetd are version 3.0.1.
> 

corosync-qdevice on nodes is still 2.4.5. corosync-qnetd on witness is
git snapshot from last November. I was not sure I could mix corosync and
corosync-qdevice of different versions and looking at git commit all
changes seem to be in qnetd anyway.

...

> 
> That's a bit harder to explain but it has a reason.
> 

OK, thank you.
...
> 
> No mater what, are you able to provide some step-by-step reproducer of
> that 40 sec delay? 

No. As I said next time I tested I got entirely different timing. I will
try after cold boot again.


More information about the Users mailing list