[ClusterLabs] Establishing Timeouts

Tue Oct 11 03:39:24 EDT 2016

On 10/10/16 19:35, Eric Robinson wrote:
> Basically, when we turn off a switch, I want to keep the cluster from failing over before Linux bonding has had a chance to recover. 
> 
> I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link outages). 
> 
> 
>>> Thanks for the clarification. So what's the easiest way to ensure 
>>> that the cluster waits a desired timeout before deciding that a re-convergence is > necessary?
> 
>> By raising the token (lost) timeout I would say.
> 
>> Please correct my (Chrissie) but I see the token (lost) timout somehow 
>> as resilience against static delays + jitter on top and the 
>> token_retransmits_before_loss_const
>> as resilience against packet-loss.
> 
> 

Yes, the token timeout is the recommended way to cope with such things.
Generally we recommend to leave the other parameters alone unless you
really know what you're doing or have had guidance from us or the the
Red Hat support team. Corosync adjusts several of the other parameters
according to the token timeout, so the relationship is not always as
simple as there just being a load of defaults.

Bear in mind that also increasing token_retransmits_before_loss_const
will increase the load on the network and the nodes in that cluster,
though the effect will probably be minimal. I do recommend testing any
values you come up with.

Chrissie