[ClusterLabs] weird corosync - [TOTEM ] FAILED TO RECEIVE
Jan Friesse
jfriesse at redhat.com
Mon Oct 15 02:24:51 EDT 2018
lejeczek,
> hi guys,
> I have a 3-node cluser(centos 7.5), 2 nodes seems fine but third(or
> probably something else in between) is not right.
> I see this:
>
> $ pcs status --all
> Cluster name: CC
> Stack: corosync
> Current DC: whale.private (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
> partition with quorum
> Last updated: Fri Oct 12 15:40:39 2018
> Last change: Fri Oct 12 15:14:57 2018 by root via crm_resource on
> whale.private
>
> 3 nodes configured
> 8 resources configured (1 DISABLED)
>
> Online: [ rental.private whale.private ]
> OFFLINE: [ rider.private ]
>
> and that third node logs:
>
> [TOTEM ] FAILED TO RECEIVE
> [TOTEM ] A new membership (10.5.6.100:2504344) was formed. Members
> left: 2 4
> [TOTEM ] Failed to receive the leave message. failed: 2 4
> [QUORUM] Members[1]: 1
> [MAIN ] Completed service synchronization, ready to provide service.
> [TOTEM ] A new membership (10.5.6.49:2504348) was formed. Members
> joined: 2 4
> [TOTEM ] FAILED TO RECEIVE
>
> and it just keeps going like that.
> Sometimes reboot(or stop of services + wait + start) of that third node
> would help.
> But, I get this situation almost every time a node gets (orderly) shut
> down or reboot.
> Network-wise, connectivity, seem okey. Where to start?
>
A little more information would be helpful (corosync version, used
protocol - udpu/udp, corosync.conf, ...), but few possible problems:
- If UDP (multicast) is used, try UDPU
- Check firewall
- Try reduce MTU used by corosync (option netmtu in corosync.conf)
Regards,
Honza
> many thanks, L
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list