[Pacemaker] Help with Pacemaker 2-node Router Setup

Michael Schwartzkopff misch at multinet.de
Sat Dec 26 06:13:24 EST 2009


Am Samstag, 26. Dezember 2009 11:55:57 schrieb Eric Renfro:
> Michael Schwartzkopff wrote:
> > Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro:
> >> Michael Schwartzkopff wrote:
> >>> Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
> >>>> Michael Schwartzkopff wrote:
> >>>>> Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I'm trying to setup 2 nodes that'll run pacemaker with openais as
> >>>>>> the communication layer. Ideally what I want is for router1 to be
> >>>>>> the master node and take over for router2 if it comes back up fully
> >>>>>> functional again. In my setup, the routers are both internet-facing
> >>>>>> servers that toggle the external internet IP to whichever controls
> >>>>>> it at the time, and also handles the internal IP for the gateway for
> >>>>>> internal systems to route via.
> >>>>>>
> >>>>>> My problem is with Route in my setup, so far, and later getting
> >>>>>> shorewall to start/stop per whichever nodes active.
> >>>>>>
> >>>>>> Route, in my case in the setup I will show below, is failing to
> >>>>>> start initially because I presume the internet IP address is not
> >>>>>> fully initialized at the time it's trying to enable the route. If I
> >>>>>> do a crm resource cleanup failover-gw, it brings it up just fine. If
> >>>>>> I try to move the router_cluster resource to router2 from router1
> >>>>>> after it's fully up, it fails because of failover-gw on router2.
> >>>>>
> >>>>> Very unlikely. If the IPaddr2 script finishes the IP address is up.
> >>>>> Please search for other reasons and grep "lrm.*failover-gw" in the
> >>>>> logs.
> >>>>>
> >>>>>> Here's my setup at present. For the moment, until I figure out how
> >>>>>> to do it, shorewall is started manually, I want to automate this
> >>>>>> once the setup is working, though, perhaps you guys could help me
> >>>>>> with that as well.
> >>>>>>
> >>>>>> primitive failover-int-ip ocf:heartbeat:IPaddr2 \
> >>>>>>         params ip="192.168.0.1" \
> >>>>>>         op monitor interval="2s"
> >>>>>> primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
> >>>>>>         params ip="24.227.124.158" cidr_netmask="30"
> >>>>>> broadcast="24.227.124.159" nic="net0" \
> >>>>>>         op monitor interval="2s" \
> >>>>>>         meta target-role="Started"
> >>>>>> primitive failover-gw ocf:heartbeat:Route \
> >>>>>>         params destination="0.0.0.0/0" gateway="24.227.124.157"
> >>>>>> device="net0" \
> >>>>>>         meta target-role="Started" \
> >>>>>>         op monitor interval="2s"
> >>>>>> group router_cluster failover-int-ip failover-ext-ip failover-gw
> >>>>>> location router-master router_cluster \
> >>>>>>         rule $id="router-master-rule" $role="master" 100: #uname eq
> >>>>>> router1
> >>>>>>
> >>>>>> I would appreciate as much help as possible. I am fairly new to
> >>>>>> pacemaker, but so far all but the Route part of this works well.
> >>>>>
> >>>>> Please give us a chance to help you providing the interesting logs!
> >>>>
> >>>> Sure..
> >>>> Here's a big clip of a log grepped from just failover-gw, if this
> >>>> helps hopefully, else, I can pinpoint more around what's happening,
> >>>> the logs fill up pretty quickly as it's coming alive.
> >>>>
> >>>> messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
> >>>> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
> >>>> the expected value: 7 (not running)
> >>>
> >>> (...)
> >>>
> >>> The rest of the logs is not needed. Just the first line tells you that
> >>> that something is not installed correctly. Please read the lines just
> >>> abobe this line. Normally it tells you what is missing.
> >>>
> >>> You also your read trough the routing resource agent in
> >>> /usr/lib/ocf/resource.d/heartbeat/Route
> >>>
> >>> Greetings,
> >>
> >> Hmmm..
> >> I'm not seeing anything about it, here's a clip of the above lines, and
> >> one line below the one saying (not installed).
> >>
> >> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
> >> Node router1 is online
> >> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
> >> failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect
> >> ed value: 7 (not running)
> >> Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation
> >> failover-gw_monitor_0 found resource failover-gw active on r
> >> outer1
> >> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
> >> Node router2 is online
> >> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
> >> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
> >>  the expected value: 7 (not running)
> >> Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard
> >> error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw
> >> from re-starting on router2
> >
> > Hi,
> >
> > there must be other log entries. In the Router RA I have before err out
> > the agent write reasons into the ocf_log(). What version of pacemaker and
> > cluster- glue do you have? What distribution you a running on?
> >
> > Greetings,
>
> I've checked all my logs. Syslog logs everything to my messages logfile,
> so it should be there if anywhere.
>
> I'm running OpenSUSE 11.2 which comes with heartbeat 2.99.3, pacemaker
> 1.0.1, openais 0.80.3, as to what all's running in this setup.

Hm. This is already a quite old verison of pacemaker. But it should run 
anyway. Please could you check the resource manually on router1.

export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_destination="0.0.0.0/0"
export OCF_RESKEY_gateway="24.227.124.157"

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should reult in 0 (started) or 7 (not started)

/usr/lib/ocf/resource.d/heartbeat/Route start; echo $?
should add the default route and result in 0

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should result in 0 (started)

/usr/lib/ocf/resource.d/heartbeat/Route stop; echo $?
should delete the default route and result in 0

/usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
should result in 7 (not started)

If this works not as expected, are the any error message?
Please see if you can debug the Route script.

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch at multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42




More information about the Pacemaker mailing list