[Pacemaker] Resource starting problem

Mon Jun 20 06:26:56 EDT 2011

Hi,

On Wed, Jun 15, 2011 at 12:46:57PM +0200, Christian Roessner wrote:
> Hi,
> 
> this is my first post on this list. I hope I put my question to the
> correct mailing-list.
> 
> I have installed Pacemaker/Corosync on two Ubuntu-Lucid Servers building
> a two node cluster. This cluster shall become a router for a datacenter.
> I installed the distribution provided packages. I guess version 1.0.8.
> 
> The cluster is set up so far and it seems to work. It seems, because
> sometimes one of the resources does not start and this is shown in the
> logs as unknown error. The error also is very random, like rolling the
> dice. But first of all, here is my crm config:
> 
> node bgwnode1 \
> 	attributes standby="off"
> node bgwnode2 \
> 	attributes standby="off"
> primitive resIPdatacenter ocf:heartbeat:IPaddr2 \
> 	meta migration-threshold="3" \
> 	op monitor interval="10s" timeout="20s" \
> 	params ip="10.0.0.1" nic="eth3" cidr_netmask="8"
> primitive resIPoffice ocf:heartbeat:IPaddr2 \
> 	meta migration-threshold="3" \
> 	op monitor interval="10s" timeout="20s" \
> 	params ip="192.168.20.1" nic="eth3" cidr_netmask="24"
> primitive resIPsubnet1 ocf:heartbeat:IPaddr2 \
> 	meta migration-threshold="3" \
> 	op monitor interval="10s" timeout="20s" \
> 	params ip="213.252.188.1" nic="eth3" cidr_netmask="25"
> primitive resIPtransfer ocf:heartbeat:IPaddr2 \
> 	meta migration-threshold="3" \
> 	op monitor interval="10s" timeout="20s" \
> 	params ip="212.68.95.210" nic="eth2" cidr_netmask="30"
> primitive resPing ocf:heartbeat:pingd \
> 	params host_list="172.16.1.1 172.16.1.2" dampen="5s" multiplier="100"
> primitive resRouteWANbcc ocf:heartbeat:Route \
> 	meta migration-threshold="3" \
> 	op monitor interval="10s" timeout="20s" \
> 	params destination="0.0.0.0/0" device="eth2" gateway="212.68.95.209"
> primitive resSysInfo ocf:heartbeat:SysInfo \
> 	op monitor interval="10s"
> clone clonePing resPing
> clone cloneSysInfo resSysInfo
> location locNetServices resIPtransfer \
> 	rule $id="locNetServices-rule" pingd: defined pingd
> xml <rsc_colocation id="totalColoc" score="INFINITY"> \
> 	<resource_set id="orderSetup-30bacef5" sequential="true"> \

I think that if you remove 'sequential="true"' the crm shell will
suddenly be able to render this element properly.

> 		<resource_ref id="resIPtransfer"/> \
> 		<resource_ref id="resIPsubnet1"/> \
> 		<resource_ref id="resIPoffice"/> \
> 		<resource_ref id="resIPdatacenter"/> \
> 		<resource_ref id="resRouteWANbcc"/> \
> 	</resource_set> \
> </rsc_colocation>
> property $id="cib-bootstrap-options" \
> 	dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> 	cluster-infrastructure="openais" \
> 	expected-quorum-votes="2" \
> 	no-quorum-policy="ignore" \
> 	stonith-enabled="false"
> rsc_defaults $id="rsc-options" \
> 	resource-stickiness="INFINITY"
> 
> The resource "resRouteWANbcc" sometimes does not start and I really
> don't know why. I thought that the resource_set would start eache
> resource one-by-one and only would start later resources if early
> resources started successfully.

For that you need an order resource set too. Most of the time,
resource dependencies need to be expressed with both collocation
and order constraints. BTW, did you consider using a group
instead?

> The route belongs to "resIPtransfer"
> which should have been up as first resource.
> 
> I also thought about adding a ocf:heartbeat:Delay resource, but this did
> not work.
> 
> I also thought that the interface might take too long because of AutoNeg
> media detection, so I configured the interfaces appropriate. This does
> not fix the problem as well.
> 
> Unfortunately if the default route is not HA, then the whole setup isn't.
> 
> And a second problem is detecting an unplugged cable. I realized that
> crm triggers the ifconfig up/down state.

How do you mean? The cluster never touches interfaces itself.
Resource agents (almost always) work only with aliases.

> So I simply installed ifplugd
> to monitor the ports and automatically bring interfaces up and down:
> 
> ARGS="-q -p -f -u0 -d0 -w -I -m ethtool"
> 
> But this also works only sometimes. So currently I am a little bit stuck :-)

I think that you need to fix this somewhere at the OS level.
IIRC, usually the interface isn't brought down on link loss (and
that sounds like a bad idea anyway).

Thanks,

Dejan
> 
> Of some of you had some beginners tips for me, I appreciate that very much.
> 
> Thanks in advance
> 
> Christian Roessner
> -- 
> Roessner-Network-Solutions
> Bachelor of Science Informatik
> 50°34.725'N, 08°40.904'O, Nahrungsberg 81, 35390 Giessen
> F: +49 641 5879091, M: +49 176 93118939
> USt-IdNr.: DE225643613
> http://www.roessner-network-solutions.com
> 

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker