[Pacemaker] Unexpected resource restarts after putting a node in standby mode

Florian Haas florian.haas at linbit.com
Mon Jul 6 02:00:38 EDT 2009


Hello everyone,

probably at bad time to ask this as Andrew is out on vacation, but maybe
Dejan or Dominik can help shed some light on this one.

I'm testing my iSCSITarget and iSCSILogicalUnit agents in a 2-node
Pacemaker 1.0.4 cluster. If you don't feel like grokking the full config
that follows, what I have is

- 2 DRBD Master/Slave resources;
- 2 resource groups, each holding one LVM VG, one iSCSITarget, and one
or more iSCSILogicalUnits;
- A cloned LSB resource managing the SCSI target daemon (tgt),
- order and colocation constraints to make sure that everything is
started in the right places and in the correct order.

What follows is my full configuration; sorry for being this noisy but I
guess it makes sense to include the full config here:

node $id="3074cde6-2e91-4259-9868-7ac94007087e" alice \
	attributes standby="off"
node $id="9a4cafd3-fcfc-4de9-9440-10bc8822d9af" bob \
	attributes standby="off"
primitive res_drbd_iscsivg01 ocf:linbit:drbd \
	params drbd_resource="iscsivg01" \
	op monitor interval="10s"
primitive res_drbd_iscsivg02 ocf:linbit:drbd \
	params drbd_resource="iscsivg02"
primitive res_lu_iscsivg01_lun1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2001-04.com.linbit:storage.alicebob.iscsivg01"
lun="1" path="/dev/iscsivg01/lun1" scsi_id="iscsivg01.lun1" \
	op monitor interval="10s"
primitive res_lu_iscsivg01_lun2 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2001-04.com.linbit:storage.alicebob.iscsivg01"
lun="2" path="/dev/iscsivg01/lun2" scsi_id="iscsivg01.lun2" \
	op monitor interval="10s"
primitive res_lu_iscsivg01_lun3 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2001-04.com.linbit:storage.alicebob.iscsivg01"
lun="3" path="/dev/iscsivg01/lun3" scsi_id="iscsivg01.lun3" \
	op monitor interval="10s"
primitive res_lu_iscsivg02_lun1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2001-04.com.linbit:storage.alicebob.iscsivg02"
lun="1" path="/dev/iscsivg02/lun1" scsi_id="iscsivg02.lun1" \
	op monitor interval="10s"
primitive res_lvm_iscsivg01 ocf:heartbeat:LVM \
	params volgrpname="iscsivg01"
primitive res_lvm_iscsivg02 ocf:heartbeat:LVM \
	params volgrpname="iscsivg02"
primitive res_target_iscsivg01 ocf:heartbeat:iSCSITarget \
	params iqn="iqn.2001-04.com.linbit:storage.alicebob.iscsivg01"
additional_parameters="DefaultTime2Retain=60" \
	op monitor interval="10s"
primitive res_target_iscsivg02 ocf:heartbeat:iSCSITarget \
	params iqn="iqn.2001-04.com.linbit:storage.alicebob.iscsivg02"
additional_parameters="DefaultTime2Retain=60" \
	op monitor interval="10s"
primitive res_tgtd lsb:tgtd
group rg_iscsivg01 res_lvm_iscsivg01 res_target_iscsivg01
res_lu_iscsivg01_lun1 res_lu_iscsivg01_lun2 res_lu_iscsivg01_lun3 \
	meta collocated="true" ordered="true" target-role="Started"
group rg_iscsivg02 res_lvm_iscsivg02 res_target_iscsivg02
res_lu_iscsivg02_lun1
ms ms_drbd_iscsivg01 res_drbd_iscsivg01 \
	meta clone-max="2" clone-node-max="1" master-max="1"
master-node-max="1" target-role="Started" notify="true"
ms ms_drbd_iscsivg02 res_drbd_iscsivg02 \
	meta master-max="1" clone-max="2" clone-node-max="1"
master-node-max="1" notify="true" target-role="Started"
clone cl_tgtd res_tgtd \
	meta target-role="Started"
colocation c_iscsivg01_on_drbd inf: rg_iscsivg01 ms_drbd_iscsivg01:Master
colocation c_iscsivg01_on_tgtd inf: rg_iscsivg01 cl_tgtd
colocation c_iscsivg02_on_drbd inf: rg_iscsivg02 ms_drbd_iscsivg02:Master
colocation c_iscsivg02_on_tgtd inf: rg_iscsivg02 cl_tgtd
order o_drbd_before_iscsivg01 inf: ms_drbd_iscsivg01:promote
rg_iscsivg01:start
order o_drbd_before_iscsivg02 inf: ms_drbd_iscsivg02:promote
rg_iscsivg02:start
order o_tgtd_before_iscsivg01 inf: cl_tgtd rg_iscsivg01
order o_tgtd_before_iscsivg02 inf: cl_tgtd rg_iscsivg02
property $id="cib-bootstrap-options" \
	dc-version="1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa" \
	cluster-infrastructure="Heartbeat" \
	stonith-enabled="false" \
	no-quorum-policy="ignore" \
	last-lrm-refresh="1246653472" \
	default-resource-stickiness="0"

Now as I switch my node named bob into standby mode, resources are
transferred to alice as expected. But, and this is the issue that I'm
having, the resource group that ran on alice all along is (needlessly,
it seems) restarted in place.

I played this thing through with ptest:

cibadmin -Q \
| sed -e 's/id="nodes-9a4cafd3-fcfc-4de9-9440-10bc8822d9af-standby"
value="off"/id="nodes-9a4cafd3-fcfc-4de9-9440-10bc8822d9af-standby"
value="on"/' > /tmp/cib.xml

[root at alice ~]# ptest -VVV -x /tmp/cib.xml 2>&1 | grep LogActions
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Demote
res_drbd_iscsivg01:0	(Master -> Stopped bob)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Stop resource
res_drbd_iscsivg01:0	(bob)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Promote
res_drbd_iscsivg01:1	(Slave -> Master alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Move resource
res_lvm_iscsivg01	(Started bob -> alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Move resource
res_target_iscsivg01	(Started bob -> alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Move resource
res_lu_iscsivg01_lun1	(Started bob -> alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Move resource
res_lu_iscsivg01_lun2	(Started bob -> alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Move resource
res_lu_iscsivg01_lun3	(Started bob -> alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Leave resource
res_tgtd:0	(Started alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Stop resource
res_tgtd:1	(bob)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Leave resource
res_drbd_iscsivg02:0	(Master alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Stop resource
res_drbd_iscsivg02:1	(bob)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Restart resource
res_lvm_iscsivg02	(Started alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Restart resource
res_target_iscsivg02	(Started alice)
ptest[3695]: 2009/07/05_19:51:08 notice: LogActions: Restart resource
res_lu_iscsivg02_lun1	(Started alice)

All those actions are fine, except for those restarts of the
rg_iscsivg02 resource group on alice. What am I doing wrong? I would
assume there must be a way to avoid these.

All comments much appreciated. Thanks!
Cheers,
Florian



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090706/4abbc231/attachment-0002.sig>


More information about the Pacemaker mailing list