[Pacemaker] Xen, Pacemaker and live migration

Mon Jan 23 17:04:38 EST 2012

> Date: Mon, 23 Jan 2012 07:24:58 +0100
> From: Frank Meier <frank.meier at hr-group.de>
> To: The Pacemaker cluster resource manager
> 	<pacemaker at oss.clusterlabs.org>
> Subject: Re: [Pacemaker] Xen, Pacemaker and live migration
> Message-ID: <4F1CFD3A.7020200 at hr-group.de>
> Content-Type: text/plain; charset="ISO-8859-1"
> 
> Hi,
> 
> there is an clvm:0 on the first and an clvm:1 on the second node. So
> it's OK, isn't it?

It's OK in the sense that it's running on every node, yes, but it also hints at the problem.  I encountered this too when setting up my Xen cluster.

With a colocation directive like yours:
>> colocation VM1WithLVM1forVM1 inf: VM1 LVMforVM1

VM1 will get colocated with LVMforVM1:0 (for example), then the migration attempt will check for LVMforVM1:0 on the other node, and not find it (because it's LVMforVM1:1 there).  Pacemaker will then bail on live migration, stop the VM, and then start it on the target node, where it will now be colocated with LVMforVM1:1.

For the same reason, you don't want to exclusively open the LV or VG.

The solution is to use an order constraint instead of colocation.  The VM doesn't need to be running with a specific cLVM instance, just started after cLVM.

Also, as others have mentioned, using groups will simplify your configuration, and you need to make sure your migrate_to timeout is long enough (migrate_from just checks that the VM is running on the target, and should complete nearly instantly).

For example, I have:
====
primitive clvm ocf:lvm2:clvmd \
	params daemon_timeout="30" \
	op start interval="0" timeout="90" \
	op stop interval="0" timeout="100"
primitive clvm-xenvg ocf:heartbeat:LVM \
	params volgrpname="xen_san"
primitive cmirror ocf:lvm2:cmirrord \
	params daemon_timeout="30" \
	op start interval="0" timeout="90" \
	op stop interval="0" timeout="100"
primitive dlm ocf:pacemaker:controld \
	op start interval="0" timeout="90" \
	op stop interval="0" timeout="100"
primitive fs-xen ocf:heartbeat:Filesystem \
	params device="/dev/xen_san/meta" directory="/mnt/xen" fstype="ocfs2" \
	op start interval="0" timeout="60" \
	op stop interval="0" timeout="60" \
	op monitor interval="20" timeout="40"
primitive o2cb ocf:ocfs2:o2cb \
	op start interval="0" timeout="90" \
	op stop interval="0" timeout="100"

primitive vm-webdev ocf:heartbeat:Xen \
	params xmfile="/mnt/xen/vm/webdev" \
	meta allow-migrate="true" target-role="Started" is-managed="true" \
	utilization cores="2" mem="1024" \
	op start interval="0" timeout="60" \
	op stop interval="0" timeout="60" \
	op migrate_to interval="0" timeout="180" \
	op monitor interval="30" timeout="30" start-delay="60"
(etc.)

group clvm-glue dlm clvm o2cb cmirror \
	meta target-role="started"
group xen-vg-fs clvm-xenvg fs-xen \
	meta target-role="started"
clone c-clvm-glue clvm-glue \
	meta interleave="true" ordered="true"
clone c-xen-vg-fs xen-vg-fs \
	meta interleave="true" ordered="true"
colocation colo-clvmglue-xenvgfs inf: c-xen-vg-fs c-clvm-glue

order o-clvmglue-xenvgfs inf: c-clvm-glue c-xen-vg-fs

order o-webdev inf: c-xen-vg-fs vm-webdev
(etc.)
====
Each Xen VM resource has a corresponding order constraint starting it after the clvm VG is active.  The only reason I split the CLVM into two groups is so I could stop my fs-xen resource (an ocfs2 filesystem, stored on clvm, where I store my Xen config files and lock files) without stopping clvmd entirely.  This is important if I ever have to unmount and fsck it.

Andrew Daugherity
Systems Analyst
Division of Research, Texas A&M University