[ClusterLabs] Can Bonding Cause a Broadcast Storm?

Jeremy Voorhis jvoorhis at gmail.com
Wed Nov 16 01:48:41 CET 2016


It's been a little while for me, but going by
https://www.kernel.org/doc/Documentation/networking/bonding.txt it looks as
if the driver sends one or more gratuitous ARPs out the active interface on
state changes, which could be close to the root of the problem depending on
how the switches are deployed (standalone? stacked?) If you're in a
position to reproduce, you might capture ARP traffic from both interfaces.

On Tue, Nov 15, 2016 at 3:49 PM Eric Robinson <eric.robinson at psmnv.com>
wrote:

mode 1. No special switch configuration. spanning tree not enabled. I have
100+ Linux servers, all of which use bonding. The network has been stable
for 10 years. No changes recently. However, this is the second time that we
have seen high latency and traced it down to the behavior of one particular
server. I'm wondering if there is something about bonding that could result
in a temporary bridge loop.
------------------------------
*From:* Jeremy Voorhis <jvoorhis at gmail.com>
*Sent:* Tuesday, November 15, 2016 2:13:59 PM
*To:* Cluster Labs - All topics related to open-source clustering welcomed
*Subject:* Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm?

What bonding mode are you using? Some modes require additional
configuration from the switch to avoid flooding. Also, is spanning tree
enabled on the switches?

On Tue, Nov 15, 2016 at 1:26 PM Eric Robinson <eric.robinson at psmnv.com>
wrote:

If a Linux server with bonded interfaces attached to different switches is
rebooted, is it possible that a bridge loop could result for a brief
period? We noticed that one of our 100 Linux servers became unresponsive
and appears to have rebooted. (The cause has not been determined.) A couple
of minutes afterwards, we saw a gigantic spike in traffic on all switches
in the network that lasted for about 7 minutes, causing latency and packet
loss on the network. Everything was still reachable, but slowly. The
condition stopped as soon as the Linux server in question became reachable
again.

--
Eric Robinson


_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clusterlabs.org/pipermail/users/attachments/20161116/f322c3e7/attachment-0001.html>


More information about the Users mailing list