[ClusterLabs] Random failure with clone of IPaddr2

alian at amisw.com alian at amisw.com
Thu Dec 15 13:17:26 EST 2016


Hi,

I got some trouble since one week and can't find solution by myself. Any
help will be really appreciated !
I use corosync / pacemaker for 3 or 4 years and all works well, for
failover or load-balancing.

I have shared ip between 3 servers, and need to remove one for upgrade.
But after I remove the server from the cluster i got random fail to access
to my shared ip. I think first that some packet want go to the old server.
So I put it again in the cluster, can reach it, but random failure is
still here :-/

My test is just a curl http://my_ip (or ssh same stuff, random failed to
connect).
A ping didn't loose any packet.
I can reach each of the three servers, but sometime, the request hang, and
got a timeout.
I see via tcpdump the packet coming, and resend, but no one respond. How I
can diagnostic this ?
I think one request on five fail. But I didn't see any messages in
firewall or /var/log/message, nothing, just like the switch choose to
remove random packet. I didn't see any counter on network interface, check
the iptable setting, recheck the log, recheck all firewall ... Where go
these packets ??

I try with another new ip, and same problem append. I try ip on two
differents subnets (10.xxx and external ip) and same stuff.

I have no problem with virtual ip in failover mode.

If someone has any clue ...






More information about the Users mailing list