[Pacemaker] colocation conundrum

Craig Donnelly craig at goaf.net
Fri Nov 23 10:06:50 UTC 2012


Ive finally managed to resolve this issue by rejigging the iSCSI resources agents.
I've reduced the number of iSCSI target resources which then allowed me to remove the use of groups.

The following config is now stable for me.

node cs1san1 \
	attributes standby="off"
node cs1san2 \
	attributes standby="off"
primitive alert ocf:heartbeat:MailTo \
	params email="ops at xyz.com" subject="CS takeover event" \
	op monitor interval="10s"
primitive cs1ddb1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="1" path="/dev/cs1vg1/cs1ddb1-1"
primitive cs1ddb2-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san2" lun="2" path="/dev/cs1vg2/cs1ddb2-1" \
	meta target-role="Started"
primitive cs1dws1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="2" path="/dev/cs1vg1/cs1dws1-1"
primitive cs1dws2-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san2" lun="3" path="/dev/cs1vg2/cs1dws2-1"
primitive cs1lb1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="3" path="/dev/cs1vg1/cs1lb1-1" \
	meta target-role="Started"
primitive cs1lb2-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san2" lun="1" path="/dev/cs1vg2/cs1lb2-1" \
	meta target-role="Started"
primitive cs1man1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="4" path="/dev/cs1vg1/cs1man1-1"
primitive cs1master1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="5" path="/dev/cs1vg1/cs1master1-1"
primitive cs1pdb1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="7" path="/dev/cs1vg1/cs1pdb1-1"
primitive cs1pdb2-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san2" lun="4" path="/dev/cs1vg2/cs1pdb2-1"
primitive cs1pws1-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san1" lun="6" path="/dev/cs1vg1/cs1pws1-1"
primitive cs1pws2-1 ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2012-10.com.xyz:cs1san2" lun="5" path="/dev/cs1vg2/cs1pws2-1"
primitive cs1vg1 ocf:heartbeat:LVM \
	params exclusive="true" volgrpname="cs1vg1" \
	op start interval="0" timeout="30s" \
	op stop interval="0" timeout="30s" \
	meta target-role="Started"
primitive cs1vg2 ocf:heartbeat:LVM \
	params exclusive="true" volgrpname="cs1vg2" \
	op start interval="0" timeout="30s" \
	op stop interval="0" timeout="30s" \
	meta target-role="Started"
primitive ping ocf:pacemaker:ping \
	params host_list="10.96.0.1 10.96.0.2" attempts="3" timeout="2s" multiplier="100" dampen="5s" \
	op monitor interval="10s"
primitive san1fencer stonith:fence_ipmilan \
	params pcmk_host_list="cs1san1" lanplus="1" ipaddr="10.96.0.21" login="admin" passwd="*******" power_wait="4s" \
	op monitor interval="60s" \
	meta target-role="Started"
primitive san1tgt ocf:heartbeat:iSCSITarget \
	params iqn="iqn.2012-10.com.xyz:cs1san1" tid="1" \
	op monitor interval="10" timeout="15"
primitive san1vip ocf:heartbeat:IPaddr2 \
	params ip="10.94.0.101" cidr_netmask="24" \
	op monitor interval="10s" \
	meta target-role="Started"
primitive san2fencer stonith:fence_ipmilan \
	params pcmk_host_list="cs1san2" lanplus="1" ipaddr="10.96.0.22" login="admin" passwd="*******" power_wait="4s" \
	op monitor interval="60s" \
	meta target-role="Started"
primitive san2tgt ocf:heartbeat:iSCSITarget \
	params iqn="iqn.2012-10.com.xyz:cs1san2" tid="2" \
	op monitor interval="10" timeout="15"
primitive san2vip ocf:heartbeat:IPaddr2 \
	params ip="10.94.0.102" cidr_netmask="24" \
	op monitor interval="10s" \
	meta target-role="Started"
clone alerts alert \
	meta target-role="Started"
clone pings ping \
	meta target-role="Started"
location san1fence san1fencer -inf: cs1san1
location san1loc cs1vg1 \
	rule $id="san1loc-rule1" 50: #uname eq cs1san1 \
	rule $id="san1loc-rule2" pingd: defined ping
location san2fence san2fencer -inf: cs1san2
location san2loc cs1vg2 \
	rule $id="san2loc-rule1" 50: #uname eq cs1san2 \
	rule $id="san2loc-rule2" pingd: defined ping
colocation san1colo inf: ( cs1lb1-1 cs1master1-1 cs1man1-1 cs1ddb1-1 cs1dws1-1 cs1pws1-1 cs1pdb1-1 ) san1tgt san1vip cs1vg1
colocation san2colo inf: ( cs1lb2-1 cs1ddb2-1 cs1dws2-1 cs1pws2-1 cs1pdb2-1 ) san2tgt san2vip cs1vg2
order san1order inf: cs1vg1 san1vip ( cs1lb1-1 cs1man1-1 cs1master1-1 cs1ddb1-1 cs1dws1-1 cs1pws1-1 cs1pdb1-1 )
order san2order inf: cs1vg2 san2vip san2tgt ( cs1lb2-1 cs1ddb2-1 cs1dws2-1 cs1pws2-1 cs1pdb2-1 )
property $id="cib-bootstrap-options" \
	dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	no-quorum-policy="ignore" \
	last-lrm-refresh="1353662425" \
	stonith-enabled="true" \
	maintenance-mode="off"



On 20 Nov 2012, at 19:40, Craig Donnelly wrote:

> Sorry yes those are typos - Im having a bad day ;)
> 
> Should have been:
> 
> order san1order inf: cs1vg1 san1vip ( cs1lb1grp cs1man1grp cs1master1grp cs1ddb1grp cs1dws1grp ) 
> order san2order inf: cs1vg2 san2vip ( cs1lb2grp ) 
> 
> 
> On 20 Nov 2012, at 19:16, Jake Smith wrote:
> 
>> 
>> 
>> 
>> ----- Original Message -----
>>> From: "Craig Donnelly" <craig at goaf.net>
>>> To: pacemaker at oss.clusterlabs.org
>>> Sent: Tuesday, November 20, 2012 1:56:03 PM
>>> Subject: [Pacemaker] colocation conundrum
>>> 
>>> Hi there,
>>> 
>>> I think Ive exhausted everything I can find online in terms of trying
>>> to solve my problem so here goes with a posting to see if anyone on
>>> this mailing list might be able to help please.
>>> 
>>> I have a pacemaker1.1.7/corosync 1.4.1  two node cluster running on
>>> CentOS 6.3.
>>> Im using this cluster to support shared storage using a combination
>>> of LVM and iSCSI.
>>> 
>>> Now failover works fine if I offline/stonith a node. However when I
>>> bring the node back online they enter a death-match situation.
>>> I see the issue as being with ordering/colocation/resource sets and I
>>> have tried a bunch of different variations and read and re-read all
>>> the information I can find online without resolution.
>>> 
>>> Would really appreciate any help/advise.
>>> 
>>> The key entries that I can see in the logs are:
>>> 
>>> NODE1:
>>> ======
>>> Nov 20 12:16:38 cs1san1 iSCSILogicalUnit(cs1lb1l1)[2710]: ERROR:
>>> tgtadm: invalid request
>>> Nov 20 12:16:39 cs1san1 iSCSILogicalUnit(cs1man1l1)[2807]: ERROR:
>>> tgtadm: invalid request
>>> Nov 20 12:22:55 cs1san1 iSCSILogicalUnit(cs1master1l1)[4482]: ERROR:
>>> tgtadm: invalid request
>>> Nov 20 12:23:17 cs1san1 iSCSILogicalUnit(cs1ddb1l1)[4968]: ERROR:
>>> tgtadm: invalid request
>>> Nov 20 12:23:18 cs1san1 iSCSILogicalUnit(cs1master1l1)[5081]: ERROR:
>>> tgtadm: invalid request
>>> Nov 20 12:30:28 cs1san1 iSCSILogicalUnit(cs1lb1l1)[2670]: ERROR:
>>> tgtadm: invalid request
>>> 
>>> NODE2:
>>> ======
>>> Nov 20 12:16:38 cs1san2 LVM(cs1vg1)[22039]: ERROR: Can't deactivate
>>> volume group "cs1vg1" with 3 open logical volume(s)
>>> Nov 20 12:22:55 cs1san2 LVM(cs1vg1)[3386]: ERROR: Can't deactivate
>>> volume group "cs1vg1" with 2 open logical volume(s)
>>> Nov 20 12:23:17 cs1san2 LVM(cs1vg1)[4296]: ERROR: Can't deactivate
>>> volume group "cs1vg1" with 1 open logical volume(s)
>>> Nov 20 12:30:27 cs1san2 LVM(cs1vg1)[14943]: ERROR: Can't deactivate
>>> volume group "cs1vg1" with 4 open logical volume(s)
>>> 
>>> which, to me, clearly indicates an ordering issue yet the
>>> configuration I have follows the colocation/ordering rules in as
>>> much as I can understand them.
>>> 
>>> My "current" CRM config is as follows:
>>> ==============================================================================
>>> node cs1san1 \
>>> 	attributes standby="off"
>>> node cs1san2 \
>>> 	attributes standby="off"
>>> primitive alert ocf:heartbeat:MailTo \
>>> 	params email="ops at xyz.com" subject="CS takeover event" \
>>> 	op monitor interval="10s"
>>> primitive cs1ddb1l1 ocf:heartbeat:iSCSILogicalUnit \
>>> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1ddb1d1" lun="1"
>>> 	path="/dev/cs1vg1/cs1ddb1d1" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1ddb1t1 ocf:heartbeat:iSCSITarget \
>>> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1ddb1d1" tid="7" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1dws1l1 ocf:heartbeat:iSCSILogicalUnit \
>>> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1dws1d1" lun="1"
>>> 	path="/dev/cs1vg1/cs1dws1d1" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1dws1t1 ocf:heartbeat:iSCSITarget \
>>> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1dws1d1" tid="8" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1lb1l1 ocf:heartbeat:iSCSILogicalUnit \
>>> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1lb1d1" lun="1"
>>> 	path="/dev/cs1vg1/cs1lb1d1" \
>>> 	op start interval="0" timeout="15" \
>>> 	op stop interval="0" timeout="15" \
>>> 	op monitor interval="10" timeout="15" \
>>> 	meta is-managed="true"
>>> primitive cs1lb1t1 ocf:heartbeat:iSCSITarget \
>>> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1lb1d1" tid="1" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1lb2l1 ocf:heartbeat:iSCSILogicalUnit \
>>> 	params target_iqn="iqn.2012-10.com.xyz.cs1san2:cs1lb2d1" lun="1"
>>> 	path="/dev/cs1vg2/cs1lb2d1" \
>>> 	op start interval="0" timeout="15" \
>>> 	op stop interval="0" timeout="15" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1lb2t1 ocf:heartbeat:iSCSITarget \
>>> 	params iqn="iqn.2012-10.com.xyz.cs1san2:cs1lb2d1" tid="2" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1man1l1 ocf:heartbeat:iSCSILogicalUnit \
>>> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1man1d1" lun="1"
>>> 	path="/dev/cs1vg1/cs1man1d1" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1man1t1 ocf:heartbeat:iSCSITarget \
>>> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1man1d1" tid="5" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1master1l1 ocf:heartbeat:iSCSILogicalUnit \
>>> 	params target_iqn="iqn.2012-10.com.xyz.cs1san1:cs1master1d1" lun="1"
>>> 	path="/dev/cs1vg1/cs1master1d1" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1master1t1 ocf:heartbeat:iSCSITarget \
>>> 	params iqn="iqn.2012-10.com.xyz.cs1san1:cs1master1d1" tid="6" \
>>> 	op monitor interval="10" timeout="15"
>>> primitive cs1vg1 ocf:heartbeat:LVM \
>>> 	params exclusive="true" volgrpname="cs1vg1" \
>>> 	op start interval="0" timeout="30s" \
>>> 	op stop interval="0" timeout="30s" \
>>> 	meta target-role="Started"
>>> primitive cs1vg2 ocf:heartbeat:LVM \
>>> 	params exclusive="true" volgrpname="cs1vg2" \
>>> 	op start interval="0" timeout="30s" \
>>> 	op stop interval="0" timeout="30s" \
>>> 	meta target-role="Started"
>>> primitive ping ocf:pacemaker:ping \
>>> 	params host_list="10.96.0.1 10.96.0.2" attempts="3" timeout="2s"
>>> 	multiplier="100" dampen="5s" \
>>> 	op monitor interval="10s"
>>> primitive san1fencer stonith:fence_ipmilan \
>>> 	params pcmk_host_list="cs1san1" lanplus="1" ipaddr="10.96.0.21"
>>> 	login="admin" passwd="xxxxxxx" power_wait="4s" \
>>> 	op monitor interval="60s" \
>>> 	meta target-role="Started"
>>> primitive san1vip ocf:heartbeat:IPaddr2 \
>>> 	params ip="10.94.0.101" cidr_netmask="24" \
>>> 	op monitor interval="10s" \
>>> 	meta target-role="Started"
>>> primitive san2fencer stonith:fence_ipmilan \
>>> 	params pcmk_host_list="cs1san2" lanplus="1" ipaddr="10.96.0.22"
>>> 	login="admin" passwd="xxxxxxxx" power_wait="4s" \
>>> 	op monitor interval="60s" \
>>> 	meta target-role="Started"
>>> primitive san2vip ocf:heartbeat:IPaddr2 \
>>> 	params ip="10.94.0.102" cidr_netmask="24" \
>>> 	op monitor interval="10s" \
>>> 	meta target-role="Started"
>>> group cs1ddb1grp cs1ddb1t1 cs1ddb1l1 \
>>> 	meta target-role="Started"
>>> group cs1dws1grp cs1dws1t1 cs1dws1l1 \
>>> 	meta target-role="Started"
>>> group cs1lb1grp cs1lb1t1 cs1lb1l1 \
>>> 	meta target-role="Started"
>>> group cs1lb2grp cs1lb2t1 cs1lb2l1 \
>>> 	meta target-role="Started"
>>> group cs1man1grp cs1man1t1 cs1man1l1 \
>>> 	meta target-role="Started"
>>> group cs1master1grp cs1master1t1 cs1master1l1 \
>>> 	meta target-role="Started"
>>> clone alerts alert \
>>> 	meta target-role="Started"
>>> clone pings ping \
>>> 	meta target-role="Started"
>>> location san1fence san1fencer -inf: cs1san1
>>> location san1loc cs1vg1 \
>>> 	rule $id="san1loc-rule1" 50: #uname eq cs1san1 \
>>> 	rule $id="san1loc-rule2" pingd: defined ping
>>> location san2fence san2fencer -inf: cs1san2
>>> location san2loc cs1vg2 \
>>> 	rule $id="san2loc-rule1" 50: #uname eq cs1san2 \
>>> 	rule $id="san2loc-rule2" pingd: defined ping
>>> colocation san1colo inf: ( cs1lb1grp cs1man1grp cs1master1grp
>>> cs1ddb1grp cs1dws1grp ) san1vip cs1vg1
>>> colocation san2colo inf: ( cs1lb2grp ) san2vip cs1vg2
>> 
>> First my disclaimer - I don't use pacemaker for iSCSI so I'm not sure about *correct* ordering for iSCSI.
>> 
>> But after quick glance it looks like you are missing the ordering statements that coincide with your colocation statements.
>> Something like this I would assume:
>> order san1order inf: cs1vg1 san1vip ( cs1lb1grp cs1man1grp cs1master1grp cs1ddb1grp cs1dws1grp )
>> order san2order inf: cs1vg2 san2vip ( cs1lb2grp )
>> 
>> HTH
>> 
>> Jake
>> 
>>> property $id="cib-bootstrap-options" \
>>> 	dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
>>> 	cluster-infrastructure="openais" \
>>> 	expected-quorum-votes="2" \
>>> 	no-quorum-policy="ignore" \
>>> 	last-lrm-refresh="1353428951" \
>>> 	stonith-enabled="true" \
>>> 	maintenance-mode="false"
>>> ===================================================================
>>> 
>>> Regards
>>> Craig
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 





More information about the Pacemaker mailing list