[Pacemaker] Resource stop during migration

Michael Smith msmith at cbnco.com
Fri Aug 27 01:22:09 EDT 2010


Hi,

I have a pacemaker setup using the Xen resource agent and I've found 
something weird during migration: if a VM is in the middle of 
live-migrating from node 1 to node 2, and I stop the resource in crm, 
pacemaker forgets about the migration and immediately thinks the resource 
is stopped, although it doesn't actually call the stop action. Meanwhile, 
the migration continues and the VM ends up running on node 2.

This can cause problems: let's say you put both nodes into standby one 
after the other. The cluster starts migrating a VM from node 1 to node 2, 
then thinks it stops the resource when node 2 goes to standby, but the 
migration continues and the VM is left running on node 2.

Later when the nodes are brought out of standby, the cluster starts the VM 
on node 1 and hoses the filesystem.

Is there a way around this? I'm not sure there is a clean way to 
abort a Xen live migration, but even if there were, the cluster isn't 
calling any actions so there'd be no way to trigger the abort.

I've tried with op_defaults record-pending="false" and "true", and with 
and without the monitor op on the Xen resource. Here's part of the log 
from a run with record-pending="false" and the following Xen primitive:

primitive vm-test2 ocf:heartbeat:Xen \
	meta allow-migrate="true" target-role="Started" \
	op monitor interval="10" \
	params xmfile="/etc/xen/vm/vm-test2"


Aug 26 15:55:49 xen-test1 pengine: [5147]: info: complex_migrate_reload: Migrating vm-test2 from xen-test1 to xen-test2
Aug 26 15:55:49 xen-test1 pengine: [5147]: notice: LogActions: Migrate resource
vm-test2        (Started xen-test1 -> xen-test2)
Aug 26 15:55:52 xen-test1 pengine: [5147]: info: complex_migrate_reload: Migrating vm-test2 from xen-test1 to xen-test2
Aug 26 15:55:52 xen-test1 pengine: [5147]: notice: LogActions: Migrate resource
vm-test2        (Started xen-test1 -> xen-test2)
Aug 26 15:55:58 xen-test1 lrmd: [5145]: info: rsc:vm-test2:40: migrate_to

Aug 26 15:55:58 xen-test1 crmd: [5148]: info: te_rsc_command: Initiating action
27: migrate_to vm-test2_migrate_to_0 on xen-test1 (local)

Aug 26 15:55:58 xen-test1 crmd: [5148]: info: process_lrm_event: LRM operation vm-test2_monitor_10000 (call=39, status=1, cib-update=0, confirmed=true) Cancelled

Aug 26 15:55:58 xen-test1 Xen[17077]: [17109]: INFO: vm-test2: Starting xm migrate to xen-test2


# "crm resource stop vm-test2" was run at this point

Aug 26 15:56:07 xen-test1 crmd: [5148]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=0) : Non-status change

Aug 26 15:56:07 xen-test1 cib: [5144]: info: log_data_element: cib:diff: +
     <nvpair id="vm-test2-meta_attributes-target-role" name="target-role" value="Stopped" __crm_diff_marker__="added:top" />

Aug 26 15:56:49 xen-test1 Xen[17077]: [17504]: INFO: vm-test2: xm migrate to xen-test2 succeeded.


cluster-glue-1.0.5-0.5.1
corosync-1.2.1-0.5.1
pacemaker-1.1.2-0.2.1
resource-agents-1.0.3-0.3.2


Thanks,
Mike




More information about the Pacemaker mailing list