[ClusterLabs] What is timeout for initial fencing after startup?
Andrew Beekhof
andrew at beekhof.net
Mon Mar 30 00:58:03 UTC 2015
> On 3 Mar 2015, at 9:10 am, Andreas Kurz <andreas.kurz at gmail.com> wrote:
>
> On 2015-02-28 07:07, Andrei Borzenkov wrote:
>> В Fri, 27 Feb 2015 22:45:56 +0100
>> Andreas Kurz <andreas.kurz at gmail.com> пишет:
>>
>>> On 2015-02-27 10:40, Andrei Borzenkov wrote:
>>>> I'm testing what happens in 2 node cluster when one node is not
>>>> present at startup. It appears that pacemaker gives up attempt to
>>>> stonith other node after 10 minutes. Where these 10 minutes come from?
>>>
>>> I'd say these 10minutes come from the cluster-property 'stonith-timeout'
>>> with its default of 60s and the feature to stop trying to fence a node
>>> after the 10th failed fencing attempt.
>>>
>>
>> Do you happen to know if "10th failed attempt" is documented somewhere?
>> I'm trying to find cluster or stonith property that may look like it is
>> related but cannot. Only stonith-timeout is present in documentation.
>
> I'm only aware this git commit in pacemaker, mentioning the 10 stonith
> attempts:
>
> commit e29d2f9f627e492032ce64563ee3def0ff1b000d
> Author: Andrew Beekhof <andrew at beekhof.net>
> Date: Fri Jul 6 16:18:19 2012 +1000
>
> High: crmd: Block after 10 failed fencing attempts for a node
I've made a note to document this.
You really want to be looking into why your fencing is so unreliable though.
More information about the Users
mailing list