[ClusterLabs] Antw: Establishing Timeouts
Ulrich.Windl at rz.uni-regensburg.de
Mon Oct 10 02:38:39 EDT 2016
>>> Eric Robinson <eric.robinson at psmnv.com> schrieb am 10.10.2016 um 06:51 in
<DM5PR03MB2729841C99A2D3F431E4064CFADB0 at DM5PR03MB2729.namprd03.prod.outlook.com>
> I have about a dozen corosync+pacemaker clusters and I am just now getting
> around to understanding timeouts.
> Most of my corosync.conf files look something like this:
> version: 2
> token: 5000
> token_retransmits_before_loss_const: 10
> join: 1000
> consensus: 7500
> vsftype: none
> max_messages: 20
> secauth: off
> threads: 0
> clear_node_high_bit: yes
> rrp_mode: active
> If I understand this correctly, this means the node will wait 50 seconds
> (5000ms x 10) before deciding that a cluster reconfig is necessary (perhaps
> after a link failure). Is that correct?
> I'm trying to understand how this works together with my bonded NIC's
> arp_interval settings. I normally set arp_interval=1000. My question is, how
> many arp losses are required before the bonding driver decides to failover to
> the other link? If arp_interval=1000, how many times does the driver send an
> arp and fail to receive a reply before it decides that the link is dead?
AFAIK, it _all_ ARP targets did not respond _once_ the link will be considered down after "Down Delay".
I guess you want to use multiple (and the correct ones) ARP IP targets... If you have/need a gateway, it's not the worst choice to try that.
> I think I need to know this so I can set my corosync.conf settings correctly
> to avoid "false positive" cluster failovers. In other words, if there is a
> link or switch failure, I want to make sure that the cluster allows plenty of
> time for link communication to recover before deciding that a node has
> actually died.
> Eric Robinson
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users