[ClusterLabs] Antw: [EXT] Re: Questions about the infamous TOTEM retransmit list

Wed Jan 13 04:34:19 EST 2021

On 1/13/21 3:31 PM, Ulrich Windl wrote:
>>>> Roger Zhou <zzhou at suse.com> schrieb am 13.01.2021 um 05:32 in Nachricht
> <97ac2305-85b4-cbb0-7133-ac137214331c at suse.com>:
>> On 1/12/21 4:23 PM, Ulrich Windl wrote:
>>> Hi!
>>>
>>> Before setting up our first pacemaker cluster we thought one low-speed
>> redundant network would be good in addition to the normal high-speed network.
>>> However as is seems now (SLES15 SP2) there is NO reasonable RRP mode to
>> drive such a configuration with corosync.
>>>
>>> Passive RRP mode with UDPU still sends each packet through both nets,
>>
>> Indeed, packets are sent in the round-robin fashion.
>>
>>> being throttled by the slower network.
>>> (Originally we were using multicast, but that was even worse)
>>>
>>> Now I realized that even under modest load, I see messages about "retransmit
>> list", like this:
>>> Jan 08 10:57:56 h16 corosync[3562]:   [TOTEM ] Retransmit List: 3e2
>>> Jan 08 10:57:56 h16 corosync[3562]:   [TOTEM ] Retransmit List: 3e2 3e4
>>> Jan 08 11:13:21 h16 corosync[3562]:   [TOTEM ] Retransmit List: 60e 610 612
>> 614
>>> Jan 08 11:13:21 h16 corosync[3562]:   [TOTEM ] Retransmit List: 610 614
>>> Jan 08 11:13:21 h16 corosync[3562]:   [TOTEM ] Retransmit List: 614
>>> Jan 08 11:13:41 h16 corosync[3562]:   [TOTEM ] Retransmit List: 6ed
>>>
>>
>> What's the latency of this low speed link?
> 
> The normal net is fibre-based:
> 4 packets transmitted, 4 received, 0% packet loss, time 3058ms
> rtt min/avg/max/mdev = 0.131/0.175/0.205/0.027 ms
> 
> The redundant net is copper-based:
> 5 packets transmitted, 5 received, 0% packet loss, time 4104ms
> rtt min/avg/max/mdev = 0.293/0.304/0.325/0.019 ms
> 

Aha, RTT < 1ms, the network is fast enough. It clear my doubt to guess the 
latency of the slow link might even in tens or even hundred ms level. Then, I 
might wonder if corosync packet get the bad luck and get delayed due to 
workload on one of the link.

>>
>>> Questions on that:
>>> Will the situation be much better with knet?
>>
>> knet provides "link_mode: passive" could fit your thought slightly which is
>> not
>> round-robin. But, it still doesn't fit your game well, since knet assumes
>> the
>> similar latency among links again. You may have to tune parameters for the
>> low
>> speed link and likely sacrifice the benefit from the fast link.
> 
> Well in the past when using HP Service Guard, everything was working quite differently:
> There was a true heartbeat on each cluster net, determining ist "being alive", and when the cluster performed no action there was no traffic on the cluster links (except that heartbeat).
> When the cluster actually had to talk, it was using the link that was flagged "alive" with a preference of primary first, then secondary when both were available.
> 

"link_mode: passive" together with knet_link_priority would be useful. Also, 
use sctp in knet could be the alternative too.

Cheers,
Roger