[Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

Klaus Darilion klaus.mailinglists at pernau.at
Thu May 16 05:31:41 EDT 2013


Hi Andreas!

On 15.05.2013 22:55, Andreas Kurz wrote:
> On 2013-05-15 15:34, Klaus Darilion wrote:
>> On 15.05.2013 14:51, Digimer wrote:
>>> On 05/15/2013 08:37 AM, Klaus Darilion wrote:
>>>> primitive st-pace1 stonith:external/xen0 \
>>>>           params hostlist="pace1" dom0="xentest1" \
>>>>           op start start-delay="15s" interval="0"
>>>
>>> Try;
>>>
>>> primitive st-pace1 stonith:external/xen0 \
>>>           params hostlist="pace1" dom0="xentest1" delay="15" \
>>>           op start start-delay="15s" interval="0"
>>>
>>> The idea here is that, when both nodes lose contact and initiate a
>>> fence, 'st-pace1' will get a 15 second reprieve. That is, 'st-pace2'
>>> will wait 15 seconds before trying to fence 'st-pace1'. If st-pace1 is
>>> still alive, it will fence 'st-pace2' without delay, so pace2 will be
>>> dead before it's timer expires, preventing a dual-fence. However, if
>>> pace1 really is dead, pace2 will fence it and recovery, just with a 15
>>> second delay.
>>
>> Sounds good, but pacemaker does not accept the parameter:
>>
>>     ERROR: st-pace1: parameter delay does not exist
>
> start-delay is an option of the monitor operation ... in fact means
> "don't trust that start was successfull, wait for the initial monitor
> some more time"
>
> The problem is, this would only make sense for one single stonith
> resource that can fence more nodes. In case of a split-brain that would
> delay the start on that node where the stonith resource was not running
> before and gives that node a "penalty".

Thanks for the clarification. I already thought that the start-delay 
workaround is not useful in my setup.

> In your example with two stonith resources running all the time,
> Digimer's suggestion is a good idea: use one of the redhat fencing
> agents, most of them have some sort of "stonith-delay" parameter that
> you can use with one instance.

I found it somehow confusing that a generic parameter (delay is useful 
for all stonith agents) is implemented in the agent, not in pacemaker. 
Further, downloading the RH source RPMS and extracting the agents is 
also quite cumbersome.

I think I will add the delay parameter to the relevant fencing agent 
myself. I guess I also have increase the stonith-timeout and add the 
configured delay.

Do you know how to submit patches for the stonith agents?

Thanks
Klaus




More information about the Pacemaker mailing list