[ClusterLabs] Random failure with clone of IPaddr2

Thu Dec 15 14:27:35 EST 2016

On 12/15/2016 12:37 PM, alian at amisw.com wrote:
> Hi,
> 
> I got some trouble since one week and can't find solution by myself. Any
> help will be really appreciated !
> I use corosync / pacemaker for 3 or 4 years and all works well, for
> failover or load-balancing.
> 
> I have shared ip between 3 servers, and need to remove one for upgrade.
> But after I remove the server from the cluster i got random fail to access
> to my shared ip. I think first that some packet want go to the old server.
> So I put it again in the cluster, can reach it, but random failure is
> still here :-/
> 
> My test is just a curl http://my_ip (or ssh same stuff, random failed to
> connect).
> A ping didn't loose any packet.
> I can reach each of the three servers, but sometime, the request hang, and
> got a timeout.
> I see via tcpdump the packet coming, and resend, but no one respond. How I
> can diagnostic this ?
> I think one request on five fail. But I didn't see any messages in
> firewall or /var/log/message, nothing, just like the switch choose to
> remove random packet. I didn't see any counter on network interface, check
> the iptable setting, recheck the log, recheck all firewall ... Where go
> these packets ??
> 
> I try with another new ip, and same problem append. I try ip on two
> differents subnets (10.xxx and external ip) and same stuff.
> 
> I have no problem with virtual ip in failover mode.
> 
> If someone has any clue ...

Seeing your configuration might help. Did you set globally-unique=true
and clone-node-max=3 on the clone? If not, the other nodes can't pick up
the lost node's share of requests.