[ClusterLabs] weird corosync - [TOTEM ] FAILED TO RECEIVE
lejeczek
peljasz at yahoo.co.uk
Fri Nov 23 09:51:47 EST 2018
On 15/10/2018 07:24, Jan Friesse wrote:
> lejeczek,
>
>> hi guys,
>> I have a 3-node cluser(centos 7.5), 2 nodes seems fine but third(or
>> probably something else in between) is not right.
>> I see this:
>>
>> $ pcs status --all
>> Cluster name: CC
>> Stack: corosync
>> Current DC: whale.private (version 1.1.18-11.el7_5.3-2b07d5c5a9) -
>> partition with quorum
>> Last updated: Fri Oct 12 15:40:39 2018
>> Last change: Fri Oct 12 15:14:57 2018 by root via crm_resource on
>> whale.private
>>
>> 3 nodes configured
>> 8 resources configured (1 DISABLED)
>>
>> Online: [ rental.private whale.private ]
>> OFFLINE: [ rider.private ]
>>
>> and that third node logs:
>>
>> [TOTEM ] FAILED TO RECEIVE
>> [TOTEM ] A new membership (10.5.6.100:2504344) was formed. Members
>> left: 2 4
>> [TOTEM ] Failed to receive the leave message. failed: 2 4
>> [QUORUM] Members[1]: 1
>> [MAIN ] Completed service synchronization, ready to provide service.
>> [TOTEM ] A new membership (10.5.6.49:2504348) was formed. Members
>> joined: 2 4
>> [TOTEM ] FAILED TO RECEIVE
>>
>> and it just keeps going like that.
>> Sometimes reboot(or stop of services + wait + start) of that third
>> node would help.
>> But, I get this situation almost every time a node gets (orderly)
>> shut down or reboot.
>> Network-wise, connectivity, seem okey. Where to start?
>>
>
> A little more information would be helpful (corosync version, used
> protocol - udpu/udp, corosync.conf, ...), but few possible problems:
> - If UDP (multicast) is used, try UDPU
> - Check firewall
> - Try reduce MTU used by corosync (option netmtu in corosync.conf)
>
> Regards,
> Honza
>
One thing I remember - could it be that because at the time of cluster
formation(and for some time after) one of the nodes had a different ruby
version from what other nodes had?
I cannot remember when this problem started to appear, was if from the
beginning or later, cannot say.
I'm on Centos 7.6. I do not think I use UDP (other then creation of some
resources and constrains it's a "vanilla" cluster). I use a
"non-default" MTU on the ifaces cluster uses, and also, those interfaces
are net-team devices. But still.. why it always be that one node (all
are virtually identical)
many thanks, L.
>
>> many thanks, L
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list