[ClusterLabs] Troubleshooting Faulty Networks / Heartbeat Rings
martin at nuboreto.org
Wed Oct 26 07:55:52 EDT 2016
One one of our test clusters the network seems to be dropping messages at
different times of the day - we know it was not a network latency issue. We
could prove it via iperf - a local network test utility.
However, I wish there was some more detailed logs than the retransmit log
messages we are seeing. Even with debug enabled in Corosync it was next to
impossible for me to get confirmation from the logs about what is causing it and
how it affects the heartbeat ring.
How can I can track the heartbeat ring in action using time stamps to first
understand how it operates in detail and finally to tune it's configuration
parameters and trouble shoot it adequately ?
It seems there is little documentation on this topic (besides the source code).
Could somebody please point me to some useful sources of information ?
More information about the Users