[Pacemaker] migration-threshold question

Juha Heinanen jh at tutpro.com
Sat Mar 21 11:04:17 EDT 2009


i have a resource that used to have this crm definition:

primitive test lsb:test \
	op monitor interval="30s" timeout="5s" \
	meta target-role="Started"

if i stopped the resource by

/etc/init.d/test stop

pacemaker restarted as i was expecting it to do.

then i modified "test" init script so that starting of the resource
always failed.  the result was that pacemaker kept on trying to restart
it forever without migrating the group of primitives of which "test" is
the last member to the other node.

i searched archives and found about parameter migration-threshold:

  If you used pacemaker 1.0 you would not have to deal with 
  failure-stickiness anymore, but could use the very nice new 
  "migration-threshold" feature. Set this to 1 and after 1 failure, the 
  resource will failover, regardless of its score.

so i went and set migration-threshold to value 3 hoping that after three
failed attempts to restart the resource the group would migrate to the
other node:

primitive test lsb:test \
	op monitor interval="30s" timeout="5s" \
	meta target-role="Started" migration-threshold="3"

the result, however, was that after 3 restart attempts, the resource
has stayed "Stopped" on the node where it failed:

============
Last updated: Sat Mar 21 19:02:07 2009
Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
2 Nodes configured.
2 Resources configured.
============

Node: lenny1 (8df8447f-6ecf-41a7-a131-c89fd59a120d): online
Node: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325): online

Master/Slave Set: ms-drbd0
    drbd0:0	(ocf::heartbeat:drbd):	Master lenny1
    drbd0:1	(ocf::heartbeat:drbd):	Slave lenny2
Resource Group: sip-proxy-group
    fs0	(ocf::heartbeat:Filesystem):	Started lenny1
    mysql-server	(lsb:mysql):	Started lenny1
    radius-server	(lsb:freeradius):	Started lenny1
    virtual-ip	(ocf::heartbeat:IPaddr2):	Started lenny1
    test	(lsb:test):	Stopped 

Failed actions:
    test_monitor_30000 (node=lenny1, call=30, rc=7, status=complete): not running

the question:  what i'm missing here, i.e., what should add to the crm
config in order to get the group migrated to the other node if
restarting of "test" fails 3 times?

-- juha




More information about the Pacemaker mailing list