[ClusterLabs] Questions about the infamous TOTEM retransmit list
zzhou at suse.com
Tue Jan 12 23:32:52 EST 2021
On 1/12/21 4:23 PM, Ulrich Windl wrote:
> Before setting up our first pacemaker cluster we thought one low-speed redundant network would be good in addition to the normal high-speed network.
> However as is seems now (SLES15 SP2) there is NO reasonable RRP mode to drive such a configuration with corosync.
> Passive RRP mode with UDPU still sends each packet through both nets,
Indeed, packets are sent in the round-robin fashion.
> being throttled by the slower network.
> (Originally we were using multicast, but that was even worse)
> Now I realized that even under modest load, I see messages about "retransmit list", like this:
> Jan 08 10:57:56 h16 corosync: [TOTEM ] Retransmit List: 3e2
> Jan 08 10:57:56 h16 corosync: [TOTEM ] Retransmit List: 3e2 3e4
> Jan 08 11:13:21 h16 corosync: [TOTEM ] Retransmit List: 60e 610 612 614
> Jan 08 11:13:21 h16 corosync: [TOTEM ] Retransmit List: 610 614
> Jan 08 11:13:21 h16 corosync: [TOTEM ] Retransmit List: 614
> Jan 08 11:13:41 h16 corosync: [TOTEM ] Retransmit List: 6ed
What's the latency of this low speed link?
I guess it is rather large, and probably not qualified for the use unless
modify the default corosync.conf carefully. Put it in another way around,
corosync mostly works for the local network with the small latency by default.
Also, it is not designed for links with large different latency.
> Questions on that:
> Will the situation be much better with knet?
knet provides "link_mode: passive" could fit your thought slightly which is not
round-robin. But, it still doesn't fit your game well, since knet assumes the
similar latency among links again. You may have to tune parameters for the low
speed link and likely sacrifice the benefit from the fast link.
> Is there a smooth migration path from UDPU to knet?
Out of my head, corosync3 need restart when switching from "transport: udpu" to
More information about the Users