[ClusterLabs] Cluster node loss detection.

Digimer lists at alteeve.ca
Fri Oct 16 11:03:40 EDT 2015

On 16/10/15 10:51 AM, Vallevand, Mark K wrote:
> It looks like it takes 20s for a cluster to detect that a node has been
> lost.

Loss is detected by corosync, and it declares loss after X lost totem
tokens, each token being declared lost after Y milliseconds. By default,
node loss should be detected in about 1 second of no network traffic,
but you need to check corosync's settings.

> The detection seems to correlate to dlm reporting its lost connection to
> the node.

Negative. DLM is informed when a node is declared lost and blocks until
fenced/stonithd tells it that the peer has been successfully fenced.
After which time, it reaps lost locks and recovers.

> Not sure if correlation is causation.


> Anyway, can someone tell me where that 20s might be coming from and if
> it is adjustable? 
> Ubuntu 12.04 LTS
> pacemaker 1.1.10
>  cman 3.1.7
> corosync 1.4.6
> Thanks!
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> Never try and teach a pig to sing: it's a waste of time, and it annoys
> the pig.
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.

This suffix has zero legal bearing, just saying. Anything posted to this
list is 100% open and public.

Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

More information about the Users mailing list