[Pacemaker] VirtualDomain/DRBD live migration with pacemaker...
Erich Weiler
weiler at soe.ucsc.edu
Mon Jun 14 16:37:04 EDT 2010
Hi All,
We have this interesting problem I was hoping someone could shed some
light on. Basically, we have 2 servers acting as a pacemaker cluster
for DRBD and VirtualDomain (KVM) resources under CentOS 5.5.
As it is set up, if one node dies, the other node promotes the DRBD
devices to "Master", then starts up the VMs there (there is one DRBD
device for each VM). This works great. I set the
'resource-stickiness="100"', and the vm resource score is 50, such that
if a VM migrates to the other server, it will stay there until I
specifically move it back manually.
Now... In the event of a failure of one server, all the VMs go to the
other server. When I fix the broken server and bring it back online,
the VMs do not migrate back automatically because of the scoring I
mentioned above. I wanted this because when the VM goes back, it
essentially has to shut down, then reboot on the other node. I'm trying
to avoid the 'shut down' part of it and do a live migration back to the
first server. But, I cannot figure out the exact sequence of events to
do this in such that pacemaker will not reboot the VM somewhere in the
process. This is my configuration, with one VM called 'caweb':
node vmserver1
node vmserver2
primitive caweb-vd ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/caweb.xml"
hypervisor="qemu:///system" \
meta allow-migrate="false" target-role="Started" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="10" timeout="30" depth="0"
primitive drbd-caweb ocf:linbit:drbd \
params drbd_resource="caweb" \
op monitor interval="15s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s"
ms ms-drbd-caweb drbd-caweb \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
location caweb-prefers-vmserver1 caweb-vd 50: vmserver1
colocation caweb-vd-on-drbd inf: caweb-vd ms-drbd-caweb:Master
order caweb-after-drbd inf: ms-drbd-caweb:promote caweb-vd:start
property $id="cib-bootstrap-options" \
dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1276538859"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
One thing I tried, in an effort to do a live migration from vmserver2 to
vmserver1 and afterward tell pacemaker to 're-acquire' the current state
of things without a VM reboot, was:
vmserver1# crm resource unmanage caweb-vd
vmserver1# crm resource unmanage ms-drbd-caweb
vmserver1# drbdadm primary caweb <--make dual primary
(then back on vmserver2...)
vmserver2# virsh migrate --live caweb qemu+ssh://hgvmserver1.local/system
vmserver2# drbdadm secondary caweb <--disable dual primary
vmserver2# crm resource manage ms-drbd-caweb
vmserver2# crm resource manage caweb-vd
vmserver2# crm resource cleanup ms-drbd-caweb
vmserver2# crm resource cleanup caweb-vd
vmserver2# crm resource refresh
vmserver2# crm resource reprobe
vmserver2# crm resource start caweb-vd
at this point the VM has live migrated and is still online.
[wait 120 seconds for caweb-vd start timeouts to expire]
For a moment I thought it had worked, but then pacemaker put the device
in an error mode and it was shut down... After bringing a resource(s)
back into 'managed' mode, is there any way to tell pacemaker to 'figure
things out' without restarting the resources? Or is this impossible
because the VM resources is dependent on the DRBD resource, and it has
trouble figuring out stacked resources without restarting them?
Or - does anyone know another way to manually live migrate a
pacemaker/VirtualDomain managed VM (with DRBD) without having to reboot
the VM after the live migrate?
Thanks in advance for any clues!! BTW, I am using pacemaker 1.0.8 and
DRBD 83.
Cheers,
-erich
More information about the Pacemaker
mailing list