[Pacemaker] stonith not triggered on resource failure

Cal Heldenbrand cal at fbsdata.com
Wed Aug 1 12:32:03 EDT 2012

Hi everyone,

I'm starting to get my memcached cluster setup more operational now.  But
I'm running into one small problem -- when my memcached resource check
fails, the stonith primitive isn't triggered to reset the node.  It only
happens when it's loaded up enough to cause corosync to fail.  When the
stonith does fire, it resets the node correctly.

Here's the relevant snippets of my config.  fence_virsh is used just for my
testing environment of Xen VMs.

node mem1
node mem2
node mem3
primitive mem1-xen-host stonith:fence_virsh \
        op monitor interval="1s" timeout="5s" \
        params ipaddr="vmhost1" login="root" action="reboot"
identity_file="/root/.ssh/id_dsa" port="mem1" pcmk_host_list="mem1"
pcmk_host_check="static-list" pcmk_host_map="" verbose="true"
debug="/var/log/vmhost1.log" \
        meta is-managed="true"
primitive memcached ocf:fbs:memcached \
        meta is-managed="true" \
        op monitor interval="1s" timeout="1s"
clone mem1-xen-host-clone mem1-xen-host \
        meta target-role="Started"
clone memcached_clone memcached \
        params ordered="false" \
        meta target-role="Started" migration-threshold="1"

# stonith device for mem1 should never run on mem1
location st-mem1-not-on-mem1 mem1-xen-host-clone -inf: mem1

# ensure ip-mem1 has a working memcache
colocation ip-mem1-on-memcache inf: cluster-ip-mem1 memcached_clone

# ensure ip-mem2 does not live on the same node as ip-mem1
# UNLESS the other 2 nodes are down.
colocation ip-mem2-not-on-ip-mem1 -10000: cluster-ip-mem2 cluster-ip-mem1

And here's what the cluster status looks like when the memcached service
check is failing, but the node is still up.

Online: [ mem1 mem2 mem3 ]

 cluster-ip-mem2        (ocf::heartbeat:IPaddr2):       Started mem2
 cluster-ip-mem1        (ocf::heartbeat:IPaddr2):       Started mem3
 Clone Set: memcached_clone [memcached]
     Started: [ mem2 mem3 ]
     Stopped: [ memcached:2 ]
 Clone Set: mem1-xen-host-clone [mem1-xen-host]
     Started: [ mem2 mem3 ]
     Stopped: [ mem1-xen-host:2 ]

What configuration directive can I add that would force the stonith event
to run when the memcached_clone is stopped?

Thank you!

