[Pacemaker] Multicast pitfalls? corosync [TOTEM ] Retransmit List:
    Jan Friesse 
    jfriesse at redhat.com
       
    Mon Feb 17 09:17:04 UTC 2014
    
    
  
Beo,
this looks like known (and already fixed) problem in kernel. Take a look 
to https://bugzilla.redhat.com/show_bug.cgi?id=880035 and specially 
comment 21. Kernel update helped that time.
Honza
Beo Banks napsal(a):
> hi stefan,
>
> it seems that's more stable but after 2 minute the issue is back again.
> hopefully isn't a bug because it can reproduce it
> node2 sents only unicast at sequenz 256...
>
> node1
>
> omping 10.0.0.22 10.0.0.21
>
>
>
> 10.0.0.22 :   unicast, seq=257, size=69 bytes, dist=0, time=0.666ms
>
> 10.0.0.22 : multicast, seq=257, size=69 bytes, dist=0, time=0.677ms
>
> 10.0.0.22 :   unicast, seq=258, size=69 bytes, dist=0, time=0.600ms
>
> 10.0.0.22 : multicast, seq=258, size=69 bytes, dist=0, time=0.610ms
>
> 10.0.0.22 :   unicast, seq=259, size=69 bytes, dist=0, time=0.693ms
>
> 10.0.0.22 : multicast, seq=259, size=69 bytes, dist=0, time=0.702ms
>
> 10.0.0.22 :   unicast, seq=260, size=69 bytes, dist=0, time=0.674ms
>
> 10.0.0.22 : multicast, seq=260, size=69 bytes, dist=0, time=0.685ms
>
> 10.0.0.22 :   unicast, seq=261, size=69 bytes, dist=0, time=0.658ms
>
> 10.0.0.22 : multicast, seq=261, size=69 bytes, dist=0, time=0.669ms
>
> 10.0.0.22 :   unicast, seq=262, size=69 bytes, dist=0, time=0.834ms
>
> 10.0.0.22 : multicast, seq=262, size=69 bytes, dist=0, time=0.845ms
>
> 10.0.0.22 :   unicast, seq=263, size=69 bytes, dist=0, time=0.666ms
>
> 10.0.0.22 : multicast, seq=263, size=69 bytes, dist=0, time=0.677ms
>
> 10.0.0.22 :   unicast, seq=264, size=69 bytes, dist=0, time=0.675ms
>
> 10.0.0.22 : multicast, seq=264, size=69 bytes, dist=0, time=0.687ms
>
> 10.0.0.22 : waiting for response msg
>
> 10.0.0.22 : server told us to stop
>
> ^C
>
> 10.0.0.22 :   unicast, xmt/rcv/%loss = 264/264/0%, min/avg/max/std-dev =
> 0.542/0.663/0.860/0.035
>
> 10.0.0.22 : multicast, xmt/rcv/%loss = 264/264/0%, min/avg/max/std-dev =
> 0.553/0.675/0.876/0.035
>
> node2:
>
> 10.0.0.21 : multicast, seq=251, size=69 bytes, dist=0, time=0.703ms
> 10.0.0.21 :   unicast, seq=252, size=69 bytes, dist=0, time=0.714ms
> 10.0.0.21 : multicast, seq=252, size=69 bytes, dist=0, time=0.725ms
> 10.0.0.21 :   unicast, seq=253, size=69 bytes, dist=0, time=0.662ms
> 10.0.0.21 : multicast, seq=253, size=69 bytes, dist=0, time=0.672ms
> 10.0.0.21 :   unicast, seq=254, size=69 bytes, dist=0, time=0.662ms
> 10.0.0.21 : multicast, seq=254, size=69 bytes, dist=0, time=0.673ms
> 10.0.0.21 :   unicast, seq=255, size=69 bytes, dist=0, time=0.668ms
> 10.0.0.21 : multicast, seq=255, size=69 bytes, dist=0, time=0.679ms
> 10.0.0.21 :   unicast, seq=256, size=69 bytes, dist=0, time=0.674ms
> 10.0.0.21 : multicast, seq=256, size=69 bytes, dist=0, time=0.687ms
> 10.0.0.21 :   unicast, seq=257, size=69 bytes, dist=0, time=0.618ms
> 10.0.0.21 :   unicast, seq=258, size=69 bytes, dist=0, time=0.659ms
> 10.0.0.21 :   unicast, seq=259, size=69 bytes, dist=0, time=0.705ms
> 10.0.0.21 :   unicast, seq=260, size=69 bytes, dist=0, time=0.682ms
> 10.0.0.21 :   unicast, seq=261, size=69 bytes, dist=0, time=0.760ms
> 10.0.0.21 :   unicast, seq=262, size=69 bytes, dist=0, time=0.665ms
> 10.0.0.21 :   unicast, seq=263, size=69 bytes, dist=0, time=0.711ms
> ^C
> 10.0.0.21 :   unicast, xmt/rcv/%loss = 263/263/0%, min/avg/max/std-dev =
> 0.539/0.661/0.772/0.037
> 10.0.0.21 : multicast, xmt/rcv/%loss = 263/256/2%, min/avg/max/std-dev =
> 0.583/0.674/0.786/0.033
>
>
>
>
> 2014-02-14 9:59 GMT+01:00 Stefan Bauer <stefan.bauer at cubewerk.de>:
>
>> you have to disable all offloading features (rx, tx, tso...)
>>
>>
>> Mit freundlichen Grüßen
>>
>> Stefan Bauer
>> --
>> Cubewerk GmbH
>> Herzog-Otto-Straße 32
>> 83308 Trostberg
>> 08621 - 99 60 237
>> HRB 22195 AG Traunstein
>> GF Stefan Bauer
>>
>> Am 14.02.2014 um 09:40 schrieb "Beo Banks" <beo.banks at googlemail.com>:
>>
>> ethtool -K eth0 tx off
>> ethtool -K eth1 tx off
>>
>> same result...retransmit issue
>>
>>
>> 2014-02-14 9:31 GMT+01:00 Beo Banks <beo.banks at googlemail.com>:
>>
>>> i have also try
>>>
>>> "No more delay when you disable multicast snooping on the host:"
>>>
>>> echo 0 > /sys/devices/virtual/net/br1/bridge/multicast_router
>>> echo 0 > /sys/devices/virtual/net/br1/bridge/multicast_snooping
>>>
>>>
>>> 2014-02-14 9:28 GMT+01:00 Beo Banks <beo.banks at googlemail.com>:
>>>
>>> @jan and stefan
>>>>
>>>> must i set it for both bridges
>>>> eth1 (br1) eth0 (br0) on the host or guest ?
>>>>
>>>>
>>>> 2014-02-14 9:06 GMT+01:00 Jan Friesse <jfriesse at redhat.com>:
>>>>
>>>> Beo,
>>>>> do you experiencing cluster split? If answer is no, then you don't need
>>>>> to do anything. Maybe network buffer is just filled. But, if answer is yes,
>>>>> try reduce mtu size (netmtu in configuration) to value like 1000.
>>>>>
>>>>> Regards,
>>>>>    Honza
>>>>>
>>>>> Beo Banks napsal(a):
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> i have a fresh 2 node cluster (kvm host1 -> guest = nodeA | kvm host2
>>>>>> ->
>>>>>> guest = NodeB) and it seems to work but from time to time i have a lot
>>>>>> of
>>>>>> errors like
>>>>>>
>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185 186
>>>>>> 187
>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185 186
>>>>>> 187
>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185 186
>>>>>> 187
>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185 186
>>>>>> 187
>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185 186
>>>>>> 187
>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185 186
>>>>>> 187
>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>>> i used the newest rhel 6.5 version.
>>>>>>
>>>>>> i have also already try solve the issue with
>>>>>> echo 1 > /sys/class/net/virbr0/bridge/multicast_querier (host system)
>>>>>> but no chance...
>>>>>>
>>>>>> i have disable iptables,selinux..same issue
>>>>>>
>>>>>> how can solve it?
>>>>>>
>>>>>> thanks beo
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/
>>>>>> doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/
>>>>> doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>>
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>> Project Home: http://www.clusterlabs.org
>>
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
    
    
More information about the Pacemaker
mailing list