[Pacemaker] Help with Pacemaker 2-node Router Setup

Sat Dec 26 05:27:54 EST 2009

Michael Schwartzkopff wrote:
> Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
>   
>> Michael Schwartzkopff wrote:
>>     
>>> Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
>>>       
>>>> Hello,
>>>>
>>>> I'm trying to setup 2 nodes that'll run pacemaker with openais as the
>>>> communication layer. Ideally what I want is for router1 to be the master
>>>> node and take over for router2 if it comes back up fully functional
>>>> again. In my setup, the routers are both internet-facing servers that
>>>> toggle the external internet IP to whichever controls it at the time,
>>>> and also handles the internal IP for the gateway for internal systems to
>>>> route via.
>>>>
>>>> My problem is with Route in my setup, so far, and later getting
>>>> shorewall to start/stop per whichever nodes active.
>>>>
>>>> Route, in my case in the setup I will show below, is failing to start
>>>> initially because I presume the internet IP address is not fully
>>>> initialized at the time it's trying to enable the route. If I do a crm
>>>> resource cleanup failover-gw, it brings it up just fine. If I try to
>>>> move the router_cluster resource to router2 from router1 after it's
>>>> fully up, it fails because of failover-gw on router2.
>>>>         
>>> Very unlikely. If the IPaddr2 script finishes the IP address is up.
>>> Please search for other reasons and grep "lrm.*failover-gw" in the logs.
>>>
>>>       
>>>> Here's my setup at present. For the moment, until I figure out how to do
>>>> it, shorewall is started manually, I want to automate this once the
>>>> setup is working, though, perhaps you guys could help me with that as
>>>> well.
>>>>
>>>> primitive failover-int-ip ocf:heartbeat:IPaddr2 \
>>>>         params ip="192.168.0.1" \
>>>>         op monitor interval="2s"
>>>> primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
>>>>         params ip="24.227.124.158" cidr_netmask="30"
>>>> broadcast="24.227.124.159" nic="net0" \
>>>>         op monitor interval="2s" \
>>>>         meta target-role="Started"
>>>> primitive failover-gw ocf:heartbeat:Route \
>>>>         params destination="0.0.0.0/0" gateway="24.227.124.157"
>>>> device="net0" \
>>>>         meta target-role="Started" \
>>>>         op monitor interval="2s"
>>>> group router_cluster failover-int-ip failover-ext-ip failover-gw
>>>> location router-master router_cluster \
>>>>         rule $id="router-master-rule" $role="master" 100: #uname eq
>>>> router1
>>>>
>>>> I would appreciate as much help as possible. I am fairly new to
>>>> pacemaker, but so far all but the Route part of this works well.
>>>>         
>>> Please give us a chance to help you providing the interesting logs!
>>>       
>> Sure..
>> Here's a big clip of a log grepped from just failover-gw, if this helps
>> hopefully, else, I can pinpoint more around what's happening, the logs
>> fill up pretty quickly as it's coming alive.
>>
>> messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
>> the expected value: 7 (not running)
>>     
> (...)
>
> The rest of the logs is not needed. Just the first line tells you that that 
> something is not installed correctly. Please read the lines just abobe this 
> line. Normally it tells you what is missing.
>
> You also your read trough the routing resource agent in 
> /usr/lib/ocf/resource.d/heartbeat/Route
>
> Greetings,
>
>
>   
Hmmm..
I'm not seeing anything about it, here's a clip of the above lines, and 
one line below the one saying (not installed).

Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status: 
Node router1 is online
Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op: 
failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect
ed value: 7 (not running)
Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation 
failover-gw_monitor_0 found resource failover-gw active on r
outer1
Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status: 
Node router2 is online
Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op: 
failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
 the expected value: 7 (not running)
Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard 
error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw 
from re-starting on router2

Like I said, though, it is strange, if I clear the failover-gw status, 
it starts normally, and stays running fine. Just initially when it's 
brought up from a cold boot, it fails every time. When it starts up into 
standby, it also fails.