[Pacemaker] Help with Pacemaker 2-node Router Setup

Michael Schwartzkopff misch at multinet.de
Sat Dec 26 05:48:39 EST 2009


Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro:
> Michael Schwartzkopff wrote:
> > Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
> >> Michael Schwartzkopff wrote:
> >>> Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
> >>>> Hello,
> >>>>
> >>>> I'm trying to setup 2 nodes that'll run pacemaker with openais as the
> >>>> communication layer. Ideally what I want is for router1 to be the
> >>>> master node and take over for router2 if it comes back up fully
> >>>> functional again. In my setup, the routers are both internet-facing
> >>>> servers that toggle the external internet IP to whichever controls it
> >>>> at the time, and also handles the internal IP for the gateway for
> >>>> internal systems to route via.
> >>>>
> >>>> My problem is with Route in my setup, so far, and later getting
> >>>> shorewall to start/stop per whichever nodes active.
> >>>>
> >>>> Route, in my case in the setup I will show below, is failing to start
> >>>> initially because I presume the internet IP address is not fully
> >>>> initialized at the time it's trying to enable the route. If I do a crm
> >>>> resource cleanup failover-gw, it brings it up just fine. If I try to
> >>>> move the router_cluster resource to router2 from router1 after it's
> >>>> fully up, it fails because of failover-gw on router2.
> >>>
> >>> Very unlikely. If the IPaddr2 script finishes the IP address is up.
> >>> Please search for other reasons and grep "lrm.*failover-gw" in the
> >>> logs.
> >>>
> >>>> Here's my setup at present. For the moment, until I figure out how to
> >>>> do it, shorewall is started manually, I want to automate this once the
> >>>> setup is working, though, perhaps you guys could help me with that as
> >>>> well.
> >>>>
> >>>> primitive failover-int-ip ocf:heartbeat:IPaddr2 \
> >>>>         params ip="192.168.0.1" \
> >>>>         op monitor interval="2s"
> >>>> primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
> >>>>         params ip="24.227.124.158" cidr_netmask="30"
> >>>> broadcast="24.227.124.159" nic="net0" \
> >>>>         op monitor interval="2s" \
> >>>>         meta target-role="Started"
> >>>> primitive failover-gw ocf:heartbeat:Route \
> >>>>         params destination="0.0.0.0/0" gateway="24.227.124.157"
> >>>> device="net0" \
> >>>>         meta target-role="Started" \
> >>>>         op monitor interval="2s"
> >>>> group router_cluster failover-int-ip failover-ext-ip failover-gw
> >>>> location router-master router_cluster \
> >>>>         rule $id="router-master-rule" $role="master" 100: #uname eq
> >>>> router1
> >>>>
> >>>> I would appreciate as much help as possible. I am fairly new to
> >>>> pacemaker, but so far all but the Route part of this works well.
> >>>
> >>> Please give us a chance to help you providing the interesting logs!
> >>
> >> Sure..
> >> Here's a big clip of a log grepped from just failover-gw, if this helps
> >> hopefully, else, I can pinpoint more around what's happening, the logs
> >> fill up pretty quickly as it's coming alive.
> >>
> >> messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
> >> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
> >> the expected value: 7 (not running)
> >
> > (...)
> >
> > The rest of the logs is not needed. Just the first line tells you that
> > that something is not installed correctly. Please read the lines just
> > abobe this line. Normally it tells you what is missing.
> >
> > You also your read trough the routing resource agent in
> > /usr/lib/ocf/resource.d/heartbeat/Route
> >
> > Greetings,
>
> Hmmm..
> I'm not seeing anything about it, here's a clip of the above lines, and
> one line below the one saying (not installed).
>
> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
> Node router1 is online
> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
> failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect
> ed value: 7 (not running)
> Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation
> failover-gw_monitor_0 found resource failover-gw active on r
> outer1
> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
> Node router2 is online
> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
>  the expected value: 7 (not running)
> Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard
> error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw
> from re-starting on router2

Hi,

there must be other log entries. In the Router RA I have before err out the 
agent write reasons into the ocf_log(). What version of pacemaker and cluster-
glue do you have? What distribution you a running on?

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch at multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42




More information about the Pacemaker mailing list