[ClusterLabs] corosync.conf token configuration

Jan Friesse jfriesse at redhat.com
Wed Jan 3 05:34:56 EST 2018


Adrián,

> Hello,
>
> I was wondering if someone have some description of the parameters: token, token_retransmits, token_retransmits_before_loss_const and consensus. I have read about it in the man page of corosync.conf but trying some configuration of the cluster I realized that I did not control when the new configuration or stonith was going to happened. I have tried corosync 1.X and 2.X in several virtual servers (debian-9).

token timeout = time after which token is declared as lost by node and 
when node begins to form new membership = token + (number_of_nodes - 2) 
* token_coefficient. This applies for corosync 2.x, 1.x doesn't have the 
token_coefficient.

consensus is just maximum time to wait till all nodes agrees on new 
membership (so consensus timeout has effect after token timeout). On 
timeout, node tries to create different membership.

token_retransmits_before_loss_const is used only for computation of 
token_retransmit, formula is token_retransmits = token_timeout / 
(token_retransmits_before_loss_const + 0.2)

token_retransmit is used for making membership
more stable. If token is not received in given time, previous token is
retransmitted. So If the token was lost on the net (and because of UDP 
it's possible), it may be retransmitted.

>
> corosync.conf:
>
> 	# How long before declaring a token lost (ms)
> 	token: 20000
>
> 	# Consensus, time before token lost to stonith the server (ms)
> 	# consensus: 60000
>
> 	# Interval between tokens (ms)
> 	# token_retransmit: 10000
>
> 	# How many token retransmited before forming a new configuration
> 	token_retransmits_before_loss_const: 20
>
> I expected to declare the token lost before 20s after the processor failed (for example, connection lost to the servers), then "token_retransmit_before_lost_const" should act (I don’t know how it works) and the stonith occurs 24s before the message of “new configuration” (default consensus = 1,2 * token -> 1,2 * 20s = 24s). In brief, the cluster is barely 44s awake without connection before reboot (stonith) is done, in contrast, I expect the parameter "token_retransmits_before_loss_const: 20” to delay the token lost whilel is trying to reconnect.

So it means 20 sec to detect the failure + (1.2 * 20 = 24) to form a new 
membership -> 44 sec.


> I am right?
>
> On the other hand, If I use the parameter consensus, I can calculate exactly when the stonith is going to happen.
>
> Please, if someone knows the answer I will appreciate any help.
> Thank you
>
>

Regards,
   Honza

>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>





More information about the Users mailing list