[Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

Klaus Darilion klaus.mailinglists at pernau.at
Wed May 15 08:37:53 EDT 2013


I have a 2 nodes cluster: a simple test setup with a 
ocf:heartbeat:IPaddr2 resource, using xen VMs and stonith:external/xen0. 
Please see the complete config below.

Basically everything works fine, except in the case of broken corosync 
communication between the nodes (simulated by shutting down the network 
link used for corosync communication). In this case, both nodes almost 
at the same time detect that the other node went offline 'unclean' and 
shoot the other node in the head, causing a reboot of both nodes.

I know that the cluster network should be reliable and then this 
scenario should not happen. But is there a solution to avoid a 
deathmatch when the cluster communication for some reason is down, but 
the stonith network still works?

For me the obvious solution would be to use different timeouts for 
triggering the head-shot. I tried "startup-delay" as suggested in 
http://www.gossamer-threads.com/lists/linuxha/pacemaker/80918 but still 
both nodes trigger the head-shot immediately.

Do I use the parameter correctly (please see config below)?

Are there other possibilities to solve this problem?

As a workaround, is it possible to tweak the timeout parameters in 
corosync.conf or should they always be identical?


node pace1
node pace2
primitive ip_service ocf:heartbeat:IPaddr2 \
         params ip="" nic="eth0" cidr_netmask="24" 
iflabel="pace" \
         op monitor interval="60s"
primitive st-pace1 stonith:external/xen0 \
         params hostlist="pace1" dom0="xentest1" \
         op start start-delay="15s" interval="0"
primitive st-pace2 stonith:external/xen0 \
         params hostlist="pace2" dom0="xentest2"
location l-st-pace1 st-pace1 -inf: pace1
location l-st-pace2 st-pace2 -inf: pace2
property $id="cib-bootstrap-options" \
         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
         cluster-infrastructure="openais" \
         expected-quorum-votes="2" \
         stonith-enabled="true" \

More information about the Pacemaker mailing list