[Pacemaker] Help with Pacemaker 2-node Router Setup

Eric Renfro erenfro at gmail.com
Sat Dec 26 05:55:57 EST 2009


Michael Schwartzkopff wrote:
> Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro:
>   
>> Michael Schwartzkopff wrote:
>>     
>>> Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
>>>       
>>>> Michael Schwartzkopff wrote:
>>>>         
>>>>> Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
>>>>>           
>>>>>> Hello,
>>>>>>
>>>>>> I'm trying to setup 2 nodes that'll run pacemaker with openais as the
>>>>>> communication layer. Ideally what I want is for router1 to be the
>>>>>> master node and take over for router2 if it comes back up fully
>>>>>> functional again. In my setup, the routers are both internet-facing
>>>>>> servers that toggle the external internet IP to whichever controls it
>>>>>> at the time, and also handles the internal IP for the gateway for
>>>>>> internal systems to route via.
>>>>>>
>>>>>> My problem is with Route in my setup, so far, and later getting
>>>>>> shorewall to start/stop per whichever nodes active.
>>>>>>
>>>>>> Route, in my case in the setup I will show below, is failing to start
>>>>>> initially because I presume the internet IP address is not fully
>>>>>> initialized at the time it's trying to enable the route. If I do a crm
>>>>>> resource cleanup failover-gw, it brings it up just fine. If I try to
>>>>>> move the router_cluster resource to router2 from router1 after it's
>>>>>> fully up, it fails because of failover-gw on router2.
>>>>>>             
>>>>> Very unlikely. If the IPaddr2 script finishes the IP address is up.
>>>>> Please search for other reasons and grep "lrm.*failover-gw" in the
>>>>> logs.
>>>>>
>>>>>           
>>>>>> Here's my setup at present. For the moment, until I figure out how to
>>>>>> do it, shorewall is started manually, I want to automate this once the
>>>>>> setup is working, though, perhaps you guys could help me with that as
>>>>>> well.
>>>>>>
>>>>>> primitive failover-int-ip ocf:heartbeat:IPaddr2 \
>>>>>>         params ip="192.168.0.1" \
>>>>>>         op monitor interval="2s"
>>>>>> primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
>>>>>>         params ip="24.227.124.158" cidr_netmask="30"
>>>>>> broadcast="24.227.124.159" nic="net0" \
>>>>>>         op monitor interval="2s" \
>>>>>>         meta target-role="Started"
>>>>>> primitive failover-gw ocf:heartbeat:Route \
>>>>>>         params destination="0.0.0.0/0" gateway="24.227.124.157"
>>>>>> device="net0" \
>>>>>>         meta target-role="Started" \
>>>>>>         op monitor interval="2s"
>>>>>> group router_cluster failover-int-ip failover-ext-ip failover-gw
>>>>>> location router-master router_cluster \
>>>>>>         rule $id="router-master-rule" $role="master" 100: #uname eq
>>>>>> router1
>>>>>>
>>>>>> I would appreciate as much help as possible. I am fairly new to
>>>>>> pacemaker, but so far all but the Route part of this works well.
>>>>>>             
>>>>> Please give us a chance to help you providing the interesting logs!
>>>>>           
>>>> Sure..
>>>> Here's a big clip of a log grepped from just failover-gw, if this helps
>>>> hopefully, else, I can pinpoint more around what's happening, the logs
>>>> fill up pretty quickly as it's coming alive.
>>>>
>>>> messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>>>> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
>>>> the expected value: 7 (not running)
>>>>         
>>> (...)
>>>
>>> The rest of the logs is not needed. Just the first line tells you that
>>> that something is not installed correctly. Please read the lines just
>>> abobe this line. Normally it tells you what is missing.
>>>
>>> You also your read trough the routing resource agent in
>>> /usr/lib/ocf/resource.d/heartbeat/Route
>>>
>>> Greetings,
>>>       
>> Hmmm..
>> I'm not seeing anything about it, here's a clip of the above lines, and
>> one line below the one saying (not installed).
>>
>> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
>> Node router1 is online
>> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>> failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect
>> ed value: 7 (not running)
>> Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation
>> failover-gw_monitor_0 found resource failover-gw active on r
>> outer1
>> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
>> Node router2 is online
>> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
>>  the expected value: 7 (not running)
>> Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard
>> error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw
>> from re-starting on router2
>>     
>
> Hi,
>
> there must be other log entries. In the Router RA I have before err out the 
> agent write reasons into the ocf_log(). What version of pacemaker and cluster-
> glue do you have? What distribution you a running on?
>
> Greetings,
>
>   
I've checked all my logs. Syslog logs everything to my messages logfile, 
so it should be there if anywhere.

I'm running OpenSUSE 11.2 which comes with heartbeat 2.99.3, pacemaker 
1.0.1, openais 0.80.3, as to what all's running in this setup.

--
Eric Renfro





More information about the Pacemaker mailing list