[ClusterLabs] Cluster node loss detection.

Digimer lists at alteeve.ca
Fri Oct 16 16:51:27 UTC 2015


On 16/10/15 12:37 PM, Vallevand, Mark K wrote:
> Fencing, yes.  I have pcmk-redirect for each node in cluster.conf.

Do you have stonith configured (and tested!) in Pacemaker as well?

> I run with default cman settings for corosync.  No totem clause.  That gives the 20s detection.  Not sure what the defaults really are.
> I added <totem token="1000" token_retransmits_before_loss_const="5" /> to cluster.conf and get about a 5s detection.
> 
> The corosync man page says:
>        token  This timeout specifies in milliseconds until a token loss is declared after not receiving a token.  This is the time spent detecting a
>               failure of a processor in the current configuration.  Reforming a new configuration takes about 50 milliseconds in  addition  to  this
>               timeout.
> 
>               The default is 1000 milliseconds.
> 
>        token_retransmit
>               This timeout specifies in milliseconds after how long before receiving a token the token is retransmitted.  This will be automatically
>               calculated if token is modified.  It is not recommended to alter this value without guidance from the corosync community.
> 
>               The default is 238 milliseconds.
> 
>        hold   This timeout specifies in milliseconds how long the token should be held by the representative when the protocol is under low utiliza‐
>               tion.   It is not recommended to alter this value without guidance from the corosync community.
> 
>               The default is 180 milliseconds.
> 
>        token_retransmits_before_loss_const
>               This  value  identifies  how  many  token  retransmits  should be attempted before forming a new configuration.  If this value is set,
>               retransmit and hold will be automatically calculated from retransmits_before_loss and token.
> 
>               The default is 4 retransmissions.
> 
> But, I don't know what cman sets these to.  But, they aren't these values.  And, they aren't the values in the cman man page, which says this:

Maybe it's changed by the ubuntu packagers? I don't know, I don't use
debian or ubuntu.

>               Cman uses different defaults for some of the corosync parameters listed in corosync.conf(5).  If you wish to use a non-default set‐
>               ting, they can be configured in cluster.conf as shown above.  Cman uses the following default values:
> 
>                 <totem
>                   vsftype="none"
>                   token="10000"
>                   token_retransmits_before_loss_const="20"
>                   join="60"
>                   consensus="4800"
>                   rrp_mode="none"
>                   <!-- or rrp_mode="active" if altnames are present >
>                 />
>                
> So, it looks like setting the corosync parameters in cluster.conf has some effect.  Cman seems to pass them to corosync.

Yes, never configure corosync directly when using cman, only use
cluster.conf, as you did.

> Onward!
> 
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
> Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
> 
> 
> -----Original Message-----
> From: Digimer [mailto:lists at alteeve.ca] 
> Sent: Friday, October 16, 2015 11:18 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Subject: Re: [ClusterLabs] Cluster node loss detection.
> 
> On 16/10/15 11:40 AM, Vallevand, Mark K wrote:
>> Thanks.  I wasn't completely aware of corosync's role in this.  I see new things in the docs every time I read them.
>>
>> I looked up the corosync settings at one time and did it again:
>> 	token loss 3000ms
>> 	retransmits 10
>> So 30s.  Redid my simple testing and got detection times of 22s, 26s, and 25s using very crude methods.
>> Any warnings about setting these values to something else?
>> We require our customers to use an isolated, private network for cluster communications.  All taken care of in our instructions and cluster configuration scripts.  Network traffic will not be a factor.  So, I'm thinking 1000ms and 5 retransmits as an experiment.
> 
> That is very high. I think the default is something like 236ms x 4 losses.
> 
> You do have fencing, right?
> 
>> I was pretty sure that DLM was just being informed by clustering, but I needed to ask.
>>
>> Again, thanks.
>> 	
>>
>> Regards.
>> Mark K Vallevand   Mark.Vallevand at Unisys.com <mailto:Mark.Vallevand at Unisys.com> 
>> Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list