[ClusterLabs] Two ethernet adapter within same subnet causing issue on Qdevice

Jan Friesse jfriesse at redhat.com
Tue Oct 6 03:50:07 EDT 2020


Richard ,

> To clarify my problem, this is more on Qdevice issue I want to fix.

The question is, how much it is really qdevice problem and if so, if 
there is really something we can do about the problem.

Qdevice itself is just using standard connect(2) call and standard TCP 
socket. So from qdevice point of view it is really kernel problem where 
to route packet to reach qnetd.

It is clear that ifdown made qdevice to lost connection with qnetd 
(that's why ip changed from ens192 to ens256) and standard qdevice 
behavior is to try reconnect. Qdevice itself is not binding to any 
specific address (it is really just a client) so after calling 
connect(2) qdevice reached qnetd via other (working) interface.

So I would suggest to try method recommended by Andrei (add host route).

Regards,
   Honza

> See below for more detail.
> Thank you,
> Richard
> 
>      ----- Original message -----
>      From: Andrei Borzenkov <arvidjaar at gmail.com>
>      Sent by: "Users" <users-bounces at clusterlabs.org>
>      To: users at clusterlabs.org
>      Cc:
>      Subject: [EXTERNAL] Re: [ClusterLabs] Two ethernet adapter within same
>      subnet causing issue on Qdevice
>      Date: Thu, Oct 1, 2020 2:45 PM
>      01.10.2020 20:09, Richard Seo пишет:
>       > Hello everyone,
>       > I'm trying to setup a cluster with two hosts:
>       > both have two ethernet adapters all within the same subnet.
>       > I've created resources for an adapter for each hosts.
>       > Here is the example:
>       > Stack: corosync
>       > Current DC: <host 1> (version 2.0.2-1.el8-744a30d655) - partition with quorum
>       > Last updated: Thu Oct  1 12:50:48 2020
>       > Last change: Thu Oct  1 12:32:53 2020 by root via cibadmin on <host 1>
>       > 2 nodes configured
>       > 2 resources configured
>       > Online: [ <host1> <host2> ]
>       > Active resources:
>       > db2_<host1>_ens192    (ocf::heartbeat:db2ethmon):     Started <host1>
>       > db2_<host2>_ens192    (ocf::heartbeat:db2ethmon):     Started <host2>
>       > I also have a qdevice setup:
>       > # corosync-qnetd-tool -l
>       > Cluster "hadom":
>       >      Algorithm:        LMS
>       >      Tie-breaker:    Node with lowest node ID
>       >      Node ID 2:
>       >          Client address:        ::ffff:<ip for ens192 for host 2>:40044
>       >          Configured node list:    1, 2
>       >          Membership node list:    1, 2
>       >          Vote:            ACK (ACK)
>       >      Node ID 1:
>       >          Client address:        ::ffff:<*ip for ens192 for host 1*>:37906
>       >          Configured node list:    1, 2
>       >          Membership node list:    1, 2
>       >          Vote:            ACK (ACK)
>       > When I ifconfig down ens192 for host 1, looks like qdevice changes the Client
>       > address to the other adapter and still give quorum to the lowest node ID
>      (which
>       > is host 1 in this case) even when the network is down for host 1.
> 
>      Network on host 1 is obviously not down as this host continues to
>      communicate with the outside world. Network may be down for your
>      specific application but then it is up to resource agent for this
>      application to detect it and initiate failover.
>      The Network (ens192) on host 1 is down. host1 can still communicate with the
>      world, because host1 has another network adapter (ens256). However, only
>      ens192 was configured as a resource. I've also configured specifically
>      ens192 ip address in the corsync.conf.
>      I want the network on host 1 down. that way, I can reproduce the problem
>      where quorum is given to a wrong node.
> 
>       > Cluster "hadom":
>       >      Algorithm:        LMS
>       >      Tie-breaker:    Node with lowest node ID
>       >      Node ID 2:
>       >          Client address:        ::ffff:<ip for ens192 for host 2>:40044
>       >          Configured node list:    1, 2
>       >          Membership node list:    1, 2
>       >          Vote:            ACK (ACK)
>       >      Node ID 1:
>       >          Client address:        ::ffff:<*ip for ens256 for host 1*>:37906
>       >          Configured node list:    1, 2
>       >          Membership node list:    1, 2
>       >          Vote:            ACK (ACK)
>       > Is there a way we can force qdevice to only route through a specified adapter
>       > (ens192 in this case)?
> 
>      Create host route via specific device.
>      I've looked over the docs, haven't found a way to do this. I've tried
>      configuring corosync.conf using the specific ip addresses. Could you specify
>      how to route to a specific network adapter from a qdevice?
> 
>       > Also while I'm on this topic, is multiple communication ring support with
>       > pacemaker supported or will be supported in the near future?
> 
>      What exactly do you mean? What communication are you talking about?
> 
>      You seem to confuse multiple layers here. qnetd and pacemaker are two
>      independent things.
>      So this is a separate question regarding Pacemaker and Corosync. I want to
>      know if having multiple communication ring in the nodelist in
>      corosync.conf is supported by Pacemaker with Corosync right now. The
>      communication protocal is called Redundant ring protocol.
> 
>      _______________________________________________
>      Manage your subscription:
>      https://lists.clusterlabs.org/mailman/listinfo/users
> 
>      ClusterLabs home: https://www.clusterlabs.org/
> 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list