[Pacemaker] stonith not triggered on resource failure

Andrew Beekhof andrew at beekhof.net
Mon Aug 6 00:58:53 EDT 2012


On Thu, Aug 2, 2012 at 2:32 AM, Cal Heldenbrand <cal at fbsdata.com> wrote:
> Hi everyone,
>
> I'm starting to get my memcached cluster setup more operational now.  But
> I'm running into one small problem -- when my memcached resource check
> fails, the stonith primitive isn't triggered to reset the node.  It only
> happens when it's loaded up enough to cause corosync to fail.  When the
> stonith does fire, it resets the node correctly.
>
> Here's the relevant snippets of my config.  fence_virsh is used just for my
> testing environment of Xen VMs.
>
> ------------------------------------------------------------------------------------------------------------------------
> node mem1
> node mem2
> node mem3
> primitive mem1-xen-host stonith:fence_virsh \
>         op monitor interval="1s" timeout="5s" \
>         params ipaddr="vmhost1" login="root" action="reboot"
> identity_file="/root/.ssh/id_dsa" port="mem1" pcmk_host_list="mem1"
> pcmk_host_check="static-list" pcmk_host_map="" verbose="true"
> debug="/var/log/vmhost1.log" \
>         meta is-managed="true"
> primitive memcached ocf:fbs:memcached \
>         meta is-managed="true" \
>         op monitor interval="1s" timeout="1s"
> clone mem1-xen-host-clone mem1-xen-host \
>         meta target-role="Started"
> clone memcached_clone memcached \
>         params ordered="false" \
>         meta target-role="Started" migration-threshold="1"
>
> # stonith device for mem1 should never run on mem1
> location st-mem1-not-on-mem1 mem1-xen-host-clone -inf: mem1
>
> # ensure ip-mem1 has a working memcache
> colocation ip-mem1-on-memcache inf: cluster-ip-mem1 memcached_clone
>
> # ensure ip-mem2 does not live on the same node as ip-mem1
> # UNLESS the other 2 nodes are down.
> colocation ip-mem2-not-on-ip-mem1 -10000: cluster-ip-mem2 cluster-ip-mem1
> -----------------------------------------------------------------------------------------------------------------------------
>
> And here's what the cluster status looks like when the memcached service
> check is failing, but the node is still up.

add on-fail=fence to the memcached monitor op definition.
seems a little severe though :-)

>
> -----------------------------------------------------------------------------------------------------------------------------
> Online: [ mem1 mem2 mem3 ]
>
>  cluster-ip-mem2        (ocf::heartbeat:IPaddr2):       Started mem2
>  cluster-ip-mem1        (ocf::heartbeat:IPaddr2):       Started mem3
>  Clone Set: memcached_clone [memcached]
>      Started: [ mem2 mem3 ]
>      Stopped: [ memcached:2 ]
>  Clone Set: mem1-xen-host-clone [mem1-xen-host]
>      Started: [ mem2 mem3 ]
>      Stopped: [ mem1-xen-host:2 ]
> -----------------------------------------------------------------------------------------------------------------------------
>
> What configuration directive can I add that would force the stonith event to
> run when the memcached_clone is stopped?
>
> Thank you!
>
> --Cal
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Pacemaker mailing list