[ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

Fri May 3 05:34:50 EDT 2024

Hi,
some of your findings are really interesting.

On 02/05/2024 01:56, alexey at pavlyuts.ru wrote:
> Hi All,
> 
>   
> 
> I am trying to build application-specific 2-node failover cluster using
> ubuntu 22, pacemaker 2.1.2 + corosync 3.1.6 and DRBD 9.2.9, knet transport.
> 

...

>   
> 
> Also, I've done wireshark capture and found great mess in TCP, it seems like
> connection between qdevice and qnetd really stops for some time and packets
> won't deliver.

Could you check UDP? I guess there is a lot of UDP packets sent by 
corosync which probably makes TCP to not go thru.

> 
>   
> 
> For my guess, it match corosync syncing activities, and I suspect that
> corosync prevent any other traffic on the interface it use for rings.
> 
>   
> 
> As I switch qnetd and qdevice to use different interface it seems to work
> fine.

Actually having dedicated interface just for corosync/knet traffic is 
optimal solution. qdevice+qnetd on the other hand should be as close to 
"customer" as possible.

So if you could have two interfaces (one just for corosync, second for 
qnetd+qdevice+publicly accessible services) it might be a solution?

> 
>   
> 
> So, the question is: does corosync really temporary blocks any other traffic
> on the interface it uses? Or it is just a coincidence? If it blocks, is

Nope, no "blocking". But it sends quite some few UDP packets and I guess 
it can really use all available bandwidth so no TCP goes thru.

Honza

> there a way to manage it?
> 
>   
> 
> Thank you for any suggest on that!
> 
>   
> 
> Sincerely,
> 
>   
> 
> Alex
> 
>   
> 
>   
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>