[Pacemaker] Stickiness, scoring, and the startled herd

Sun Sep 27 02:32:02 EDT 2009

Hi all,

I've got a cluster of three Xen dom0s (running VMs managed by pacemaker with
DRBD in the dom0s for the VM disks) that I'm trying to get working in a
stable fashion, but I'm having a hard time avoiding what I've dubbed the
"startled herd" problem.

Basically, once the allocation of VMs is in a stable state, the whole
cluster sits there quite happily and the VMs run nicely.  However, the
moment *anything* about the cluster changes (adding a new VM, a new
constraint on a VM -- practically *anything*) then all of the VMs start
stopping and starting themselves, and everything just gets badly out of hand
for a few minutes.  Obviously, this isn't particularly highly available, and
I really need to stop it.

It seemed like stickiness was the solution to my problem -- crank the
stickiness up high enough, and things have to stay put.  However, even with
a stickiness of 100000, the damn things just won't stay put.

As an example: One of the servers (xen1) got rebooted, and everything moved
around.  So, I cranked up the stickiness to 999999 in an attempt to keep
everything where it was -- and when xen1 came back, everything stayed where
it was (WIN!).  But then I inserted the location rule for one of the VMs to
move to xen1, and it didn't move.  OK, fair enough, I've effectively made
everything infinitely sticky -- so I dropped the stickiness on everything
back to 100000, and *bam* next thing I know I've got 3 VMs on xen1, an extra
2 VMs on xen2, and now xen3 is completely empty.

How the hell did that happen?  What am I doing wrong?

On a related topic, is there any way to find out what the cluster's scores
for all resources are, and how it came to calculate those scores?  The logs
are full of trivia, but they lack this sort of really useful detail.  I
assume there'd be some sort of tool I could run against a CIB XML file and
see what's going on, but I can't seem to find anything.

System info: pacemaker 1.0.4, Heartbeat 2.99.2+sles11r1, running on Debian
Lenny.

Any help greatly appreciated -- whether it be docs (I've read Configuration
Explained and Colocation Explained, but there doesn't seem to be much else
out there), extra diagnostic commands I can run to examine the system state,
or just a simple "you need to set fooble to 17".

For reference, here's the output of crm configure show as it currently
stands (I'd provide log output, except there's so much of it I have no idea
what to chop -- I can't even recognise where one thing ends and the next
begins):

node $id="046fdbe2-40a8-41a8-bfd9-a62504fe7954" xen3
node $id="6267c66f-5824-4da6-b9f7-3ddfff35aab3" xen2
node $id="80177967-82ec-415b-b6ff-ec3a9de315c7" xen1
primitive vm1_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm1_disk" resource-stickiness="100000"
primitive vm1_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm1.cfg" resource-stickiness="100000"
primitive vm2_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm2_disk" resource-stickiness="100000"
primitive vm2_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm2.cfg" resource-stickiness="100000"
primitive vm3_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm3_disk" resource-stickiness="100000"
primitive vm3_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm3.cfg" resource-stickiness="100000"
primitive vm4_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm4_disk" resource-stickiness="100000"
primitive vm4_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm4.cfg" resource-stickiness="100000"
primitive vm5_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm5_disk" resource-stickiness="100000"
primitive vm5_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm5.cfg" resource-stickiness="100000"
primitive vm6_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm6_disk" resource-stickiness="100000"
primitive vm6_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm6.cfg" resource-stickiness="100000"
primitive vm7_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params resource-stickiness="100000" drbd_resource="vm7_disk"
primitive vm7_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	params resource-stickiness="100000" xmfile="/etc/xen/vm7.cfg"
primitive vm8_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params drbd_resource="vm8_disk" resource-stickiness="100000"
primitive vm8_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	op stop interval="0" timeout="300s" \
	params xmfile="/etc/xen/vm8.cfg" resource-stickiness="100000"
primitive stonith-xen1 stonith:external/ipmi \
	op monitor interval="5s" \
	params interface="lan" passwd="s3kr1t" userid="stonith"	ipaddr="10.0.0.1" hostname="xen1"
primitive stonith-xen2 stonith:external/ipmi \
	op monitor interval="5s" \
	params ipaddr="10.0.0.2" userid="stonith" passwd="s3kr1t" interface="lan" hostname="xen2"
primitive stonith-xen3 stonith:external/ipmi \
	op monitor interval="5s" \
	params interface="lan" hostname="xen3" passwd="s3kr1t" userid="stonith" ipaddr="10.0.0.3"
primitive vm9_disk ocf:linbit:drbd \
	op monitor interval="10s" \
	params resource-stickiness="100000" drbd_resource="vm9_disk"
primitive vm9_vm ocf:heartbeat:Xen \
	op monitor interval="10s" \
	params resource-stickiness="100000" xmfile="/etc/xen/vm9.cfg"
ms ms_vm1_disk vm1_disk \
	meta master-max="1" clone-max="2" master-node-max="1" notify="true" clone-node-max="1"
ms ms_vm2_disk vm2_disk \
	meta master-max="1" notify="true" clone-node-max="1" master-node-max="1" clone-max="2"
ms ms_vm3_disk vm3_disk \
	meta master-node-max="1" master-max="1" clone-node-max="1" notify="true" clone-max="2"
ms ms_vm4_disk vm4_disk \
	meta master-node-max="1" clone-node-max="1" notify="true" clone-max="2" master-max="1"
ms ms_vm5_disk vm5_disk \
	meta notify="true" clone-max="2" master-node-max="1" master-max="1" clone-node-max="1"
ms ms_vm6_disk vm6_disk \
	meta clone-node-max="1" clone-max="2" master-max="1" master-node-max="1" notify="true"
ms ms_vm7_disk vm7_disk \
	meta clone-node-max="1" clone-max="2" notify="true" master-node-max="1" master-max="1"
ms ms_vm8_disk vm8_disk \
	meta master-max="1" clone-node-max="1" master-node-max="1" notify="true" clone-max="2"
ms ms_vm9_disk vm9_disk \
	meta notify="true" clone-max="2" master-max="1" master-node-max="1" clone-node-max="1"
clone clone-stonith-xen1 stonith-xen1 \
	meta clone-node-max="1" globally-unique="false" interleave="false" notify="false" ordered="false" clone-max="2"
clone clone-stonith-xen2 stonith-xen2 \
	meta ordered="false" notify="false" globally-unique="false" clone-node-max="1" interleave="false" clone-max="2"
clone clone-stonith-xen3 stonith-xen3 \
	meta notify="false" globally-unique="false" ordered="false" interleave="false" clone-node-max="1" clone-max="2"
location vm1_vm_dom0s ms_vm1_disk \
	rule $id="vm1_vm_dom0s-rule" -inf: #uname ne xen1 and #uname ne xen3
location vm2_vm_dom0s ms_vm2_disk \
	rule $id="vm2_vm_dom0s-rule" -inf: #uname ne xen1 and #uname ne xen3
location cli-prefer-vm4_vm vm4_vm \
	rule $id="cli-prefer-rule-vm4_vm" inf: #uname eq xen1
location clone-stonith-xen1-not-on-xen1 clone-stonith-xen1 -inf: xen1
location clone-stonith-xen2-not-on-xen2 clone-stonith-xen2 -inf: xen2
location clone-stonith-xen3-not-on-xen3 clone-stonith-xen3 -inf: xen3
location vm3_vm_dom0s ms_vm3_disk \
	rule $id="vm3_vm_dom0s-rule" -inf: #uname ne xen2 and #uname ne xen3
location vm4_vm_dom0s ms_vm4_disk \
	rule $id="vm4_vm_dom0s-rule" -inf: #uname ne xen1 and #uname ne xen3
location vm5_vm_dom0s ms_vm5_disk \
	rule $id="vm5_vm_dom0s-rule" -inf: #uname ne xen2 and #uname ne xen3
location vm6_vm_dom0s ms_vm6_disk \
	rule $id="vm6_vm_dom0s-rule" -inf: #uname ne xen2 and #uname ne xen1
location vm7_vm_dom0s ms_vm7_disk \
	rule $id="vm7_vm_dom0s-rule" -inf: #uname ne xen2 and #uname ne xen1
location vm8_vm_dom0s ms_vm8_disk \
	rule $id="vm8_vm_dom0s-rule" -inf: #uname ne xen3 and #uname ne xen2
location vm9_vm_dom0s ms_vm9_disk \
	rule $id="vm9_vm_dom0s-rule" -inf: #uname ne xen2 and #uname ne xen1
colocation vm1_disk_with_vm inf: vm1_vm ms_vm1_disk:Master
colocation vm2_disk_with_vm inf: vm2_vm ms_vm2_disk:Master
colocation vm3_disk_with_vm inf: vm3_vm ms_vm3_disk:Master
colocation vm4_disk_with_vm inf: vm4_vm ms_vm4_disk:Master
colocation vm5_disk_with_vm inf: vm5_vm ms_vm5_disk:Master
colocation vm6_disk_with_vm inf: vm6_vm ms_vm6_disk:Master
colocation vm7_disk_with_vm inf: vm7_vm ms_vm7_disk:Master
colocation vm8_disk_with_vm inf: vm8_vm ms_vm8_disk:Master
colocation vm9_disk_with_vm inf: vm9_vm ms_vm9_disk:Master
order start_vm_after_vm1_disk inf: ms_vm1_disk:promote vm1_vm:start
order start_vm_after_vm2_disk inf: ms_vm2_disk:promote vm2_vm:start
order start_vm_after_vm3_disk inf: ms_vm3_disk:promote vm3_vm:start
order start_vm_after_vm4_disk inf: ms_vm4_disk:promote vm4_vm:start
order start_vm_after_vm5_disk inf: ms_vm5_disk:promote vm5_vm:start
order start_vm_after_vm6_disk inf: ms_vm6_disk:promote vm6_vm:start
order start_vm_after_vm7_disk inf: ms_vm7_disk:promote vm7_vm:start
order start_vm_after_vm8_disk inf: ms_vm8_disk:promote vm8_vm:start
order start_vm_after_vm9_disk inf: ms_vm9_disk:promote vm9_vm:start
property $id="cib-bootstrap-options" \
	dc-version="1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa" \
	cluster-infrastructure="Heartbeat" \
	last-lrm-refresh="1254021457" \
	stonith-enabled="true" \
	stonith-action="poweroff"