[ClusterLabs] IPaddr2 works for 12 seconds then stops

Thu Oct 11 13:25:52 EDT 2018

I'm adding a VIP to my active/active two node cluster using IPaddr2. 
These are on updated CentOS 7.5 machines.

When I bring up the IP, I'm able to ping it from an external machine for 
about 12 seconds and then I get no further responses. This happens each 
time I restart the VIP clone.

I can bring up the IP as a static alias IP on either of the two servers 
and it works fine; I.E., I can then ping it from my external server 
continuously. It's only when I try to cluster the IP that I have the issue.

For the 12 second window it *does* work in, it appears as though it 
works only on one of the two servers (and always the same one). My 
twelve seconds of pings runs continuously then stops; while attempts to 
hit the Web server works hit or miss depending on my source port (I'm 
using sourceip-sourceport). I.E., as if anything that would be handled 
by the other server isn't making it through. But after the 12 seconds 
neither server responds to the requests against the VIP (but they do 
respond fine to their own static IPs at all times).

During the 12 seconds that it works I get these in the logs of the 
server that *is* responding:

Oct 11 12:17:43 node2 kernel: ipt_CLUSTERIP: unknown protocol 1
Oct 11 12:17:44 node2 kernel: ipt_CLUSTERIP: unknown protocol 1
Oct 11 12:17:45 node2 kernel: ipt_CLUSTERIP: unknown protocol 1

Looking at the CLUSTERIP rules created, they *seem* to be ok to me (I 
also tried shutting down the cluster and setting up the rules/adding the 
IP "by hand" with the same results):

[root at node2 ~]# iptables -L -n | grep -i cluster
CLUSTERIP  all  --  0.0.0.0/0            192.168.120.101       CLUSTERIP 
hashmode=sourceip-sourceport clustermac=01:00:5E:5C:4B:8A total_nodes=2 
local_node=2 hash_init=0
[root at colovs2 ~]# cat /proc/net/ipt_CLUSTERIP/192.168.120.101
2

[root at node1 ~]# iptables -L -n | grep -i cluster
CLUSTERIP  all  --  0.0.0.0/0            192.168.120.101       CLUSTERIP 
hashmode=sourceip-sourceport clustermac=01:00:5E:5C:4B:8A total_nodes=2 
local_node=1 hash_init=0
[root at node1 ~]# cat /proc/net/ipt_CLUSTERIP/192.168.120.101
1

Above was with a MAC that I forced into my VIP setup (see below), but I 
also tried with no MAC address provided (using the IPaddr2 default) with 
the same result.

The logs just seem to note the initialization with (apparently) nothing 
else interesting:

[root at node1 corosync]# cat /var/log/messages | grep -i ipaddr2
Oct 11 12:44:31 node1 IPaddr2(VIP:0)[105006]: INFO: Adding inet address 
192.168.120.101/24 with broadcast address 192.168.120.255 to device bond0
Oct 11 12:44:31 node1 IPaddr2(VIP:0)[105006]: INFO: Bringing device bond0 up
Oct 11 12:44:31 node1 IPaddr2(VIP:0)[105006]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -c 5 -p 
/var/run/resource-agents/send_arp-192.168.120.101 -I bond0 -m 
01005e5c4b8a 192.168.120.101

And then:

[root at node1 corosync]# ip addr show bond0
10: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc 
noqueue state UP group default qlen 1000
     link/ether 14:18:77:32:e3:d4 brd ff:ff:ff:ff:ff:ff
     inet 192.168.120.80/24 brd 192.168.120.255 scope global bond0
        valid_lft forever preferred_lft forever
     inet 192.168.120.101/24 brd 192.168.120.255 scope global secondary 
bond0
        valid_lft forever preferred_lft forever
     inet6 fe80::1618:77ff:fe32:e3d4/64 scope link
        valid_lft forever preferred_lft forever

Finally:

[root at node1 corosync]# pcs resource show VIP-clone
  Clone: VIP-clone
   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true 
interleave=true
   Resource: VIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: cidr_netmask=24 ip=192.168.120.101 nic=bond0 
clusterip_hash=sourceip-sourceport mac=01:00:5e:5c:4b:8a
    Meta Attrs: resource-stickiness=0
    Utilization: weight=100
    Operations: monitor interval=10 timeout=20 (VIP-monitor-interval-10)
                start interval=0s timeout=20s (VIP-start-interval-0s)
                stop interval=0s timeout=20s (VIP-stop-interval-0s)

Don't know where to look next. Any ideas?