[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: Re: EL6, cman, rrp, unicast and iptables

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 17 06:30:05 UTC 2015


>>> Noel Kuntze <noel at familie-kuntze.de> schrieb am 15.09.2015 um 18:07 in
Nachricht <55F8423E.9040403 at familie-kuntze.de>:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Hello Ullrich,
> 
>> If you send a protocol from A to B where neither A's interface nor B's
>> interface has any errors, and B reports a protocol error, the obvious
>> conclusion is that the protocol is broken. Es pecially if the protocol 
> claims
>> to implement reliable in-order transfer.
>> Of cours from "no interface errors" you cannot deduce "no protocol
errors",
>> but if both parties use the same software, the bad protocol can only come 
> from
>> the software.
> 
> Agreed.
> 
>> NFS over UDP has the feature that it works under load, even if some
packets
>> are dropped. I cannot confirm that property for TOTEM.
> 
> NFS is for bulk data transfer and has to deal with network saturation.
> TOTEM isn't designed for this. Taking this from the first link[1]:
> "The Totem single-ring protocol"
> *Provides rapid detection of network partitioning and processor failure 
> together with reconfiguration and membership services.
> 
> This property of TOTEM acts against the desired property of robustness 
> against network
> problems. I think the TOTEM configuration can be improved against
> the known problems and sensitivity to datagram losses by increasing the 
> retransmission timeout and threshold.
> Doing this increases the time it takes for Corosync to detect dead nodes, 
> though.

I kind of disagree: I see TOTEM as a transport protocol for cluster messages,
NOT as a detection mechanism for node failures. Of course TOTEM as a token
passing protocol had to detect node failures to ensure proper operation. That's
what timeouts in the totem protocol are for.
However in your view, a node failure event would be the consequence of a ring
failure.
As it's not the case here (no node fails, but TOTEM ring fails), I still claim
that totem is either broken, or the barely documented interdependency of totem
parameters is to much for the user to get right.

We found that DLM/cLVM don't play well with TOTEM under load, so the
conclusion was that the totem protocol doesn't work well with network loads. If
I'm right DLM also uses Totem to exchange information. If so this would also
support my view of TOTEM being a cluster transport protocol rather than a node
failure detection mechanism.

Regards,
Ulrich

> 
> - -- 
> 
> Mit freundlichen Grüßen/Kind Regards,
> Noel Kuntze
> 
> GPG Key ID: 0x63EC6658
> Fingerprint: 23CA BB60 2146 05E7 7278 6592 3839 298F 63EC 6658
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
> 
> iQIcBAEBCAAGBQJV+EI8AAoJEDg5KY9j7GZYRwoP/3zwsEjJSk+VBXBq6avMJEch
> F56oLTScFal/CZ6l1J2fgw68GbfZWOnh1xFfoik8iBH8VEOt9wfow32BlkLmTt4u
> 8zjyIjOCzTW9jbcebeVYx72SqutdQpCPfR1/D7jQJdHBT6wk6aTornMrpVYGjac7
> UyGoUb05nTFaMniNN4T52TkpQJ+/sgwVk68ymIXljcBIcnDh2gbF5cvXXni+KFMW
> 2J3SjlZm8GvbUhYuW/HGAiRPURgZtvMgTnBAfByUkTTRGiZVGIbRJ4sjJTRdKgQW
> 4Y3hqjc4o9brLgGr3oiydKo0uvjDOELb6wJMvL1p6BoNgWQ9lI68FZCYEOLa6MOf
> bXFRIqcaoeoTWuiBdEVydEhsXvr05mYu9hATIf53w9514XQIFSlAFBLPwtXr/OlO
> e4r6M8avfGsC2FnWp9prHZ5NBJTD6GY3OOpJIprboBGhqTsobRXXFU6DSupHvNtY
> X0P30o6FjeAEc2l2aZxUp3m5C5L6hQTKioNC4rUG5uGK9fHOR1IX6l1M3Nmf5cxd
> BdHASF7Eg06pHMUCxQ0jMW3hcHFAJ4aaUDzVzj0nOh77ZfsLeG0+rr+3KO6UmVko
> GLHquPygFe/S6J6MnZVKiuCMyIrlCORsg2sggELpo8NjKmuuQHWz0eP81PefuCVF
> cASk6Mfsx8v97iPQUAJG
> =hSE9
> -----END PGP SIGNATURE-----
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list