[Pacemaker] [Semi-OT] To bridge or to bond?

Sat May 5 17:06:47 EDT 2012

Hi,

05.05.2012 20:05, Arnold Krille wrote:
> Hi all,
> 
> please excuse (and ignore) this mail when you think its not appropriate for 
> this list or to faq.
> 
> We had our servers all connected via one gigabit switch and used bonds to have 
> 2GB links for each of them (using drbd and pacemaker/corosync to keep our data 
> distributed and services/machines up and running).
> As the switch constitudes a SPOF, we wanted to eliminate this and put a second 
> GB-switch into the rack.
> Now I/we can't use the real bonding-modes anymore, only fail-over, tlb and 
> alb. We don't really like the idea of fail-over because that means going back 
> to 1GB data-rates. Using tlb we get nearly 2GBits total rates with 1GB per 
> connection so that looks nice throughput wise. But for simple icmp-pings, 
> 50-90% of pings are lost propably due to the switches re-learning the mac-
> addresses all the time. Also some tcp-connections seem to stall due to this. 
> Not really a nice situation when desktop-virtualization and terminal servers 
> are used in this scenario.
> 
> My questions:
> Is there something obvious I missed in the above configuration?(*)

>From my experience, you'd better use switches which support "stack"
operation mode with support for link aggregation (802.3ad, LACP bonding)
in stack, so different physical links from the aggregated one are
connected to different physical switches in a stack. This way any of
stack members may die, but everything yet works. You'll never reach
2Gbps for 1G+1G LACP btw, only 1.8Gbps.

c3750-x supports up to 32 aggregated links in a stack, so it should be
higher then current pacemaker scalability limit (16-24 nodes if I
understand current situation correctly).
Be warned that not every vendors announced stacked operation mode means
that will work - I reached the goal only with a third switch pair - I
tried Dlink (they do not work at all although they announce opposite in
their whitepapers), Supermicro (they work but are too unstable) and
finally only Cisco fulfilled my needs. HP's stacking solution seems to
be only half-backed btw so I didn't try them at all.
I bought two 24-port switches two years ago for ~$15k for a pair with
two 10G SFP+ ports each - so 24 port switch costs around $5k (2.5k was
SFP+ module).
I've spent much time trying to find something working for less money,
but failed.

> Would it improve the situation stability- and performance-wise when I use 
> bridges instead of bonds to connect to the switches and let stp do its job? 
> Would that work with clusters and drbd?

I'm afraid bridges have too high timeout when they switch to a backup
channel. And STP bridge uses only one of available paths anyways...

I use active-passive bonding in a two-node cluster, where primary link
is back-to-back 10G ethernet, and standby link is a vlan in 10G
connection to switch. With some tuning corosync works fine when bond is
switching to a backup link.

> Obviously the cleanest solution would be to use two stackable switches and 
> make sure that they still do their job when one fails. But that is out of 
> question due to the prices attached to the switches...

You'd better think again...

> 
> Thanks for your input on this and have a nice remaining weekend,
> 
> Arnold
> 
> (*) I haven't yet looked into the switches configuration if they have special 
> options for such a scenario...

Best,
Vladislav