[ClusterLabs] Troubleshooting Faulty Networks / Heartbeat Rings
Martin Schlegel
martin at nuboreto.org
Wed Oct 26 11:55:52 UTC 2016
Hello all
One one of our test clusters the network seems to be dropping messages at
different times of the day - we know it was not a network latency issue. We
could prove it via iperf - a local network test utility.
However, I wish there was some more detailed logs than the retransmit log
messages we are seeing. Even with debug enabled in Corosync it was next to
impossible for me to get confirmation from the logs about what is causing it and
how it affects the heartbeat ring.
How can I can track the heartbeat ring in action using time stamps to first
understand how it operates in detail and finally to tune it's configuration
parameters and trouble shoot it adequately ?
It seems there is little documentation on this topic (besides the source code).
Could somebody please point me to some useful sources of information ?
Regards,
Martin Schlegel
More information about the Users
mailing list