[ClusterLabs] Antw: Troubleshooting Faulty Networks / Heartbeat Rings

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Oct 26 13:54:17 UTC 2016


>>> Martin Schlegel <martin at nuboreto.org> schrieb am 26.10.2016 um 13:55 in
Nachricht
<1875006565.5761.eadc80df-ed0f-4dcf-bc75-89991bd8c2a1.open-xchange at email.1und1.d 
>:
> Hello all
> 
> One one of our test clusters the network seems to be dropping messages at
> different times of the day - we know it was not a network latency issue. We
> could prove it via iperf - a local network test utility.
> 
> However, I wish there was some more detailed logs than the retransmit log
> messages we are seeing. Even with debug enabled in Corosync it was next to
> impossible for me to get confirmation from the logs about what is causing it 
> and
> how it affects the heartbeat ring.
> 
> How can I can track the heartbeat ring in action using time stamps to first
> understand how it operates in detail and finally to tune it's configuration
> parameters and trouble shoot it adequately ?
> 
> It seems there is little documentation on this topic (besides the source 
> code).
> Could somebody please point me to some useful sources of information ?

The best thing I ever found was corosync-blackbox ;-)

> 
> 
> Regards,
> Martin Schlegel
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 








More information about the Users mailing list