[Pacemaker] Multicast pitfalls? corosync [TOTEM ] Retransmit List:

Beo Banks beo.banks at googlemail.com
Wed Mar 5 12:50:45 UTC 2014


thanks jan. but i am using the newest rhel release and i have still the
issue. i could optimize it with
#!/bin/bash
echo 1 > /sys/class/net/virbr0/bridge/multicast_querier
echo 0 > /sys/class/net/virbr0/bridge/multicast_snooping

echo "cat /sys/class/net/virbr0/bridge/multicast_snooping"
cat /sys/class/net/virbr0/bridge/multicast_snooping

echo "cat /sys/class/net/virbr0/bridge/multicast_querier"
cat /sys/class/net/virbr0/bridge/multicast_querier

echo 1 > /sys/class/net/br0/bridge/multicast_querier
echo 0 > /sys/class/net/br0/bridge/multicast_snooping

echo "cat /sys/class/net/br0/bridge/multicast_snooping"
cat /sys/class/net/br0/bridge/multicast_snooping

echo "cat /sys/class/net/br0/bridge/multicast_querier"
cat /sys/class/net/br0/bridge/multicast_querier

echo 1 > /sys/class/net/br1/bridge/multicast_querier
echo 0 > /sys/class/net/br1/bridge/multicast_snooping

echo "cat /sys/class/net/br1/bridge/multicast_snooping"
cat /sys/class/net/br1/bridge/multicast_snooping

echo "cat /sys/class/net/br1/bridge/multicast_querier"
cat /sys/class/net/br1/bridge/multicast_querier

but after a few days the cluster fence the other node -> network failure....


info: ais_mark_unseen_peer_dead: Node .com was not seen in the previous
transition
Mar 04 19:23:33 corosync [pcmk  ] info: update_member: Node
352321546/u.comis now: lost
Mar 04 19:23:33 corosync [pcmk  ] info: send_member_notification: Sending
membership update 780 to 2 children









2014-02-17 10:17 GMT+01:00 Jan Friesse <jfriesse at redhat.com>:

> Beo,
> this looks like known (and already fixed) problem in kernel. Take a look
> to https://bugzilla.redhat.com/show_bug.cgi?id=880035 and specially
> comment 21. Kernel update helped that time.
>
> Honza
>
> Beo Banks napsal(a):
>
>  hi stefan,
>>
>> it seems that's more stable but after 2 minute the issue is back again.
>> hopefully isn't a bug because it can reproduce it
>> node2 sents only unicast at sequenz 256...
>>
>> node1
>>
>> omping 10.0.0.22 10.0.0.21
>>
>>
>>
>> 10.0.0.22 :   unicast, seq=257, size=69 bytes, dist=0, time=0.666ms
>>
>> 10.0.0.22 : multicast, seq=257, size=69 bytes, dist=0, time=0.677ms
>>
>> 10.0.0.22 :   unicast, seq=258, size=69 bytes, dist=0, time=0.600ms
>>
>> 10.0.0.22 : multicast, seq=258, size=69 bytes, dist=0, time=0.610ms
>>
>> 10.0.0.22 :   unicast, seq=259, size=69 bytes, dist=0, time=0.693ms
>>
>> 10.0.0.22 : multicast, seq=259, size=69 bytes, dist=0, time=0.702ms
>>
>> 10.0.0.22 :   unicast, seq=260, size=69 bytes, dist=0, time=0.674ms
>>
>> 10.0.0.22 : multicast, seq=260, size=69 bytes, dist=0, time=0.685ms
>>
>> 10.0.0.22 :   unicast, seq=261, size=69 bytes, dist=0, time=0.658ms
>>
>> 10.0.0.22 : multicast, seq=261, size=69 bytes, dist=0, time=0.669ms
>>
>> 10.0.0.22 :   unicast, seq=262, size=69 bytes, dist=0, time=0.834ms
>>
>> 10.0.0.22 : multicast, seq=262, size=69 bytes, dist=0, time=0.845ms
>>
>> 10.0.0.22 :   unicast, seq=263, size=69 bytes, dist=0, time=0.666ms
>>
>> 10.0.0.22 : multicast, seq=263, size=69 bytes, dist=0, time=0.677ms
>>
>> 10.0.0.22 :   unicast, seq=264, size=69 bytes, dist=0, time=0.675ms
>>
>> 10.0.0.22 : multicast, seq=264, size=69 bytes, dist=0, time=0.687ms
>>
>> 10.0.0.22 : waiting for response msg
>>
>> 10.0.0.22 : server told us to stop
>>
>> ^C
>>
>> 10.0.0.22 :   unicast, xmt/rcv/%loss = 264/264/0%, min/avg/max/std-dev =
>> 0.542/0.663/0.860/0.035
>>
>> 10.0.0.22 : multicast, xmt/rcv/%loss = 264/264/0%, min/avg/max/std-dev =
>> 0.553/0.675/0.876/0.035
>>
>> node2:
>>
>> 10.0.0.21 : multicast, seq=251, size=69 bytes, dist=0, time=0.703ms
>> 10.0.0.21 :   unicast, seq=252, size=69 bytes, dist=0, time=0.714ms
>> 10.0.0.21 : multicast, seq=252, size=69 bytes, dist=0, time=0.725ms
>> 10.0.0.21 :   unicast, seq=253, size=69 bytes, dist=0, time=0.662ms
>> 10.0.0.21 : multicast, seq=253, size=69 bytes, dist=0, time=0.672ms
>> 10.0.0.21 :   unicast, seq=254, size=69 bytes, dist=0, time=0.662ms
>> 10.0.0.21 : multicast, seq=254, size=69 bytes, dist=0, time=0.673ms
>> 10.0.0.21 :   unicast, seq=255, size=69 bytes, dist=0, time=0.668ms
>> 10.0.0.21 : multicast, seq=255, size=69 bytes, dist=0, time=0.679ms
>> 10.0.0.21 :   unicast, seq=256, size=69 bytes, dist=0, time=0.674ms
>> 10.0.0.21 : multicast, seq=256, size=69 bytes, dist=0, time=0.687ms
>> 10.0.0.21 :   unicast, seq=257, size=69 bytes, dist=0, time=0.618ms
>> 10.0.0.21 :   unicast, seq=258, size=69 bytes, dist=0, time=0.659ms
>> 10.0.0.21 :   unicast, seq=259, size=69 bytes, dist=0, time=0.705ms
>> 10.0.0.21 :   unicast, seq=260, size=69 bytes, dist=0, time=0.682ms
>> 10.0.0.21 :   unicast, seq=261, size=69 bytes, dist=0, time=0.760ms
>> 10.0.0.21 :   unicast, seq=262, size=69 bytes, dist=0, time=0.665ms
>> 10.0.0.21 :   unicast, seq=263, size=69 bytes, dist=0, time=0.711ms
>> ^C
>> 10.0.0.21 :   unicast, xmt/rcv/%loss = 263/263/0%, min/avg/max/std-dev =
>> 0.539/0.661/0.772/0.037
>> 10.0.0.21 : multicast, xmt/rcv/%loss = 263/256/2%, min/avg/max/std-dev =
>> 0.583/0.674/0.786/0.033
>>
>>
>>
>>
>> 2014-02-14 9:59 GMT+01:00 Stefan Bauer <stefan.bauer at cubewerk.de>:
>>
>>  you have to disable all offloading features (rx, tx, tso...)
>>>
>>>
>>> Mit freundlichen Grüßen
>>>
>>> Stefan Bauer
>>> --
>>> Cubewerk GmbH
>>> Herzog-Otto-Straße 32
>>> 83308 Trostberg
>>> 08621 - 99 60 237
>>> HRB 22195 AG Traunstein
>>> GF Stefan Bauer
>>>
>>> Am 14.02.2014 um 09:40 schrieb "Beo Banks" <beo.banks at googlemail.com>:
>>>
>>> ethtool -K eth0 tx off
>>> ethtool -K eth1 tx off
>>>
>>> same result...retransmit issue
>>>
>>>
>>> 2014-02-14 9:31 GMT+01:00 Beo Banks <beo.banks at googlemail.com>:
>>>
>>>  i have also try
>>>>
>>>> "No more delay when you disable multicast snooping on the host:"
>>>>
>>>> echo 0 > /sys/devices/virtual/net/br1/bridge/multicast_router
>>>> echo 0 > /sys/devices/virtual/net/br1/bridge/multicast_snooping
>>>>
>>>>
>>>> 2014-02-14 9:28 GMT+01:00 Beo Banks <beo.banks at googlemail.com>:
>>>>
>>>> @jan and stefan
>>>>
>>>>>
>>>>> must i set it for both bridges
>>>>> eth1 (br1) eth0 (br0) on the host or guest ?
>>>>>
>>>>>
>>>>> 2014-02-14 9:06 GMT+01:00 Jan Friesse <jfriesse at redhat.com>:
>>>>>
>>>>> Beo,
>>>>>
>>>>>> do you experiencing cluster split? If answer is no, then you don't
>>>>>> need
>>>>>> to do anything. Maybe network buffer is just filled. But, if answer
>>>>>> is yes,
>>>>>> try reduce mtu size (netmtu in configuration) to value like 1000.
>>>>>>
>>>>>> Regards,
>>>>>>    Honza
>>>>>>
>>>>>> Beo Banks napsal(a):
>>>>>>
>>>>>>  Hi,
>>>>>>>
>>>>>>> i have a fresh 2 node cluster (kvm host1 -> guest = nodeA | kvm host2
>>>>>>> ->
>>>>>>> guest = NodeB) and it seems to work but from time to time i have a
>>>>>>> lot
>>>>>>> of
>>>>>>> errors like
>>>>>>>
>>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185
>>>>>>> 186
>>>>>>> 187
>>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185
>>>>>>> 186
>>>>>>> 187
>>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185
>>>>>>> 186
>>>>>>> 187
>>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185
>>>>>>> 186
>>>>>>> 187
>>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 196 198 184 185
>>>>>>> 186
>>>>>>> 187
>>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 197 199
>>>>>>> Feb 13 13:41:04 corosync [TOTEM ] Retransmit List: 197 199 184 185
>>>>>>> 186
>>>>>>> 187
>>>>>>> 188 189 18a 18b 18c 18d 18e 18f 190 191 192 193 194 195 196 198
>>>>>>> i used the newest rhel 6.5 version.
>>>>>>>
>>>>>>> i have also already try solve the issue with
>>>>>>> echo 1 > /sys/class/net/virbr0/bridge/multicast_querier (host
>>>>>>> system)
>>>>>>> but no chance...
>>>>>>>
>>>>>>> i have disable iptables,selinux..same issue
>>>>>>>
>>>>>>> how can solve it?
>>>>>>>
>>>>>>> thanks beo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/
>>>>>>> doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/
>>>>>> doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>  _______________________________________________
>>>
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>
>>> Project Home: http://www.clusterlabs.org
>>>
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140305/a0624b88/attachment-0003.html>


More information about the Pacemaker mailing list