[ClusterLabs] weird corosync - [TOTEM ] FAILED TO RECEIVE

Ken Gaillot kgaillot at redhat.com
Fri Oct 12 18:41:25 EDT 2018


On Fri, 2018-10-12 at 15:51 +0100, lejeczek wrote:
> hi guys,
> I have a 3-node cluser(centos 7.5), 2 nodes seems fine but 
> third(or probably something else in between) is not right.
> I see this:
> 
>   $ pcs status --all
> Cluster name: CC
> Stack: corosync
> Current DC: whale.private (version 
> 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
> Last updated: Fri Oct 12 15:40:39 2018
> Last change: Fri Oct 12 15:14:57 2018 by root via 
> crm_resource on whale.private
> 
> 3 nodes configured
> 8 resources configured (1 DISABLED)
> 
> Online: [ rental.private whale.private ]
> OFFLINE: [ rider.private ]
> 
> and that third node logs:
> 
> [TOTEM ] FAILED TO RECEIVE
>   [TOTEM ] A new membership (10.5.6.100:2504344) was formed. 
> Members left: 2 4
>   [TOTEM ] Failed to receive the leave message. failed: 2 4
>   [QUORUM] Members[1]: 1
>   [MAIN  ] Completed service synchronization, ready to 
> provide service.
>   [TOTEM ] A new membership (10.5.6.49:2504348) was formed. 
> Members joined: 2 4
>   [TOTEM ] FAILED TO RECEIVE
> 
> and it just keeps going like that.
> Sometimes reboot(or stop of services + wait + start) of that 
> third node would help.
> But, I get this situation almost every time a node gets 
> (orderly) shut down or reboot.
> Network-wise, connectivity, seem okey. Where to start?
> 
> many thanks, L

Odd. I'd recommend turning on debug logging in corosync.conf, and
posting the log here. Hopefully one of the corosync developers can
chime in at that point.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list