[Pacemaker] Help with Pacemaker 2-node Router Setup

Eric Renfro erenfro at gmail.com
Sun Dec 27 05:50:23 UTC 2009


Michael Schwartzkopff wrote:
> Am Samstag, 26. Dezember 2009 11:55:57 schrieb Eric Renfro:
>   
>> Michael Schwartzkopff wrote:
>>     
>>> Am Samstag, 26. Dezember 2009 11:27:54 schrieb Eric Renfro:
>>>       
>>>> Michael Schwartzkopff wrote:
>>>>         
>>>>> Am Samstag, 26. Dezember 2009 10:52:38 schrieb Eric Renfro:
>>>>>           
>>>>>> Michael Schwartzkopff wrote:
>>>>>>             
>>>>>>> Am Samstag, 26. Dezember 2009 08:12:49 schrieb Eric Renfro:
>>>>>>>               
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm trying to setup 2 nodes that'll run pacemaker with openais as
>>>>>>>> the communication layer. Ideally what I want is for router1 to be
>>>>>>>> the master node and take over for router2 if it comes back up fully
>>>>>>>> functional again. In my setup, the routers are both internet-facing
>>>>>>>> servers that toggle the external internet IP to whichever controls
>>>>>>>> it at the time, and also handles the internal IP for the gateway for
>>>>>>>> internal systems to route via.
>>>>>>>>
>>>>>>>> My problem is with Route in my setup, so far, and later getting
>>>>>>>> shorewall to start/stop per whichever nodes active.
>>>>>>>>
>>>>>>>> Route, in my case in the setup I will show below, is failing to
>>>>>>>> start initially because I presume the internet IP address is not
>>>>>>>> fully initialized at the time it's trying to enable the route. If I
>>>>>>>> do a crm resource cleanup failover-gw, it brings it up just fine. If
>>>>>>>> I try to move the router_cluster resource to router2 from router1
>>>>>>>> after it's fully up, it fails because of failover-gw on router2.
>>>>>>>>                 
>>>>>>> Very unlikely. If the IPaddr2 script finishes the IP address is up.
>>>>>>> Please search for other reasons and grep "lrm.*failover-gw" in the
>>>>>>> logs.
>>>>>>>
>>>>>>>               
>>>>>>>> Here's my setup at present. For the moment, until I figure out how
>>>>>>>> to do it, shorewall is started manually, I want to automate this
>>>>>>>> once the setup is working, though, perhaps you guys could help me
>>>>>>>> with that as well.
>>>>>>>>
>>>>>>>> primitive failover-int-ip ocf:heartbeat:IPaddr2 \
>>>>>>>>         params ip="192.168.0.1" \
>>>>>>>>         op monitor interval="2s"
>>>>>>>> primitive failover-ext-ip ocf:heartbeat:IPaddr2 \
>>>>>>>>         params ip="24.227.124.158" cidr_netmask="30"
>>>>>>>> broadcast="24.227.124.159" nic="net0" \
>>>>>>>>         op monitor interval="2s" \
>>>>>>>>         meta target-role="Started"
>>>>>>>> primitive failover-gw ocf:heartbeat:Route \
>>>>>>>>         params destination="0.0.0.0/0" gateway="24.227.124.157"
>>>>>>>> device="net0" \
>>>>>>>>         meta target-role="Started" \
>>>>>>>>         op monitor interval="2s"
>>>>>>>> group router_cluster failover-int-ip failover-ext-ip failover-gw
>>>>>>>> location router-master router_cluster \
>>>>>>>>         rule $id="router-master-rule" $role="master" 100: #uname eq
>>>>>>>> router1
>>>>>>>>
>>>>>>>> I would appreciate as much help as possible. I am fairly new to
>>>>>>>> pacemaker, but so far all but the Route part of this works well.
>>>>>>>>                 
>>>>>>> Please give us a chance to help you providing the interesting logs!
>>>>>>>               
>>>>>> Sure..
>>>>>> Here's a big clip of a log grepped from just failover-gw, if this
>>>>>> helps hopefully, else, I can pinpoint more around what's happening,
>>>>>> the logs fill up pretty quickly as it's coming alive.
>>>>>>
>>>>>> messages:Dec 26 02:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>>>>>> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
>>>>>> the expected value: 7 (not running)
>>>>>>             
>>>>> (...)
>>>>>
>>>>> The rest of the logs is not needed. Just the first line tells you that
>>>>> that something is not installed correctly. Please read the lines just
>>>>> abobe this line. Normally it tells you what is missing.
>>>>>
>>>>> You also your read trough the routing resource agent in
>>>>> /usr/lib/ocf/resource.d/heartbeat/Route
>>>>>
>>>>> Greetings,
>>>>>           
>>>> Hmmm..
>>>> I'm not seeing anything about it, here's a clip of the above lines, and
>>>> one line below the one saying (not installed).
>>>>
>>>> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
>>>> Node router1 is online
>>>> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>>>> failover-gw_monitor_0 on router1 returned 0 (ok) instead of the expect
>>>> ed value: 7 (not running)
>>>> Dec 26 05:00:21 router1 pengine: [4724]: WARN: unpack_rsc_op: Operation
>>>> failover-gw_monitor_0 found resource failover-gw active on r
>>>> outer1
>>>> Dec 26 05:00:21 router1 pengine: [4724]: info: determine_online_status:
>>>> Node router2 is online
>>>> Dec 26 05:00:21 router1 pengine: [4724]: info: unpack_rsc_op:
>>>> failover-gw_monitor_0 on router2 returned 5 (not installed) instead of
>>>>  the expected value: 7 (not running)
>>>> Dec 26 05:00:21 router1 pengine: [4724]: ERROR: unpack_rsc_op: Hard
>>>> error - failover-gw_monitor_0 failed with rc=5: Preventing failover-gw
>>>> from re-starting on router2
>>>>         
>>> Hi,
>>>
>>> there must be other log entries. In the Router RA I have before err out
>>> the agent write reasons into the ocf_log(). What version of pacemaker and
>>> cluster- glue do you have? What distribution you a running on?
>>>
>>> Greetings,
>>>       
>> I've checked all my logs. Syslog logs everything to my messages logfile,
>> so it should be there if anywhere.
>>
>> I'm running OpenSUSE 11.2 which comes with heartbeat 2.99.3, pacemaker
>> 1.0.1, openais 0.80.3, as to what all's running in this setup.
>>     
>
> Hm. This is already a quite old verison of pacemaker. But it should run 
> anyway. Please could you check the resource manually on router1.
>
> export OCF_ROOT=/usr/lib/ocf
> export OCF_RESKEY_destination="0.0.0.0/0"
> export OCF_RESKEY_gateway="24.227.124.157"
>
> /usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
> should reult in 0 (started) or 7 (not started)
>
> /usr/lib/ocf/resource.d/heartbeat/Route start; echo $?
> should add the default route and result in 0
>
> /usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
> should result in 0 (started)
>
> /usr/lib/ocf/resource.d/heartbeat/Route stop; echo $?
> should delete the default route and result in 0
>
> /usr/lib/ocf/resource.d/heartbeat/Route monitor; echo $?
> should result in 7 (not started)
>
> If this works not as expected, are the any error message?
> Please see if you can debug the Route script.
>
> Greetings,
>
>   
I did all these tests, and all results came back normal. First monitor 
returned 7, not started, after starting, returned 0 and monitor returned 
0, stop returned 0, and monitor after stopping returned 7.

Seems the error for me is further up initiallly which causes it to not 
start afterwards. Here's the current setup:

primitive intIP ocf:heartbeat:IPaddr2 \
         params ip="192.168.0.1" cidr_netmask="16" broadcast="192.168.255.255" nic="lan0"
primitive extIP ocf:heartbeat:IPaddr2 \
         params ip="24.227.124.158" cidr_netmask="30" broadcast="24.227.124.159" nic="net0"
primitive resRoute ocf:heartbeat:Route \
         params destination="0.0.0.0/0" gateway="24.227.124.157" \
primitive firewall lsb:shorewall
group router_cluster extIP intIP resRoute shorewall
location router-master router_cluster \
         rule $id="router-master-rule" $role="master" 100: #uname eq router1

And here's the log:

Dec 27 00:23:39 router2 mgmtd: [25787]: debug: main: run the loop...
Dec 27 00:23:39 router2 mgmtd: [25787]: info: Started.
Dec 27 00:23:39 router2 cib: [25790]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.RqbSTT (digest: 
/var/lib/heartbeat/crm/cib.CNwZjy)
Dec 27 00:24:37 router2 crmd: [25786]: info: crm_timer_popped: Election 
Trigger (I_DC_TIMEOUT) just popped!
Dec 27 00:24:37 router2 crmd: [25786]: WARN: do_log: FSA: Input 
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: State 
transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT 
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
Dec 27 00:24:37 router2 crmd: [25786]: info: do_te_control: Registering 
TE UUID: 804c8e68-f81f-4d16-b9ab-6c6a4ea41b57
Dec 27 00:24:37 router2 crmd: [25786]: WARN: 
cib_client_add_notify_callback: Callback already present
Dec 27 00:24:37 router2 crmd: [25786]: info: set_graph_functions: 
Setting custom graph functions
Dec 27 00:24:37 router2 crmd: [25786]: info: unpack_graph: Unpacked 
transition -1: 0 actions in 0 synapses
Dec 27 00:24:37 router2 crmd: [25786]: info: do_dc_takeover: Taking over 
DC status for this partition
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_readwrite: We 
are now in R/W mode
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_master for section 'all' 
(origin=local/crmd/6, version=0.135.0): ok (rc=0)
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section cib (origin=local/crmd/7, 
version=0.135.0): ok (rc=0)
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/9, version=0.135.0): ok (rc=0)
Dec 27 00:24:37 router2 crmd: [25786]: info: join_make_offer: Making 
join offers based on membership 2796
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/11, version=0.135.0): ok (rc=0)
Dec 27 00:24:37 router2 crmd: [25786]: info: do_dc_join_offer_all: 
join-1: Waiting on 1 outstanding join acks
Dec 27 00:24:37 router2 crmd: [25786]: info: ais_dispatch: Membership 
2796: quorum still lost
Dec 27 00:24:37 router2 crmd: [25786]: info: config_query_callback: 
Checking for expired actions every 900000ms
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/14, version=0.135.0): ok (rc=0)
Dec 27 00:24:37 router2 crmd: [25786]: info: update_dc: Set DC to 
router2 (3.0.1)
Dec 27 00:24:37 router2 crmd: [25786]: info: ais_dispatch: Membership 
2796: quorum still lost
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section crm_config 
(origin=local/crmd/17, version=0.135.0): ok (rc=0)
Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: State 
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 27 00:24:37 router2 crmd: [25786]: info: do_state_transition: All 1 
cluster nodes responded to the join offer.
Dec 27 00:24:37 router2 crmd: [25786]: info: do_dc_join_finalize: 
join-1: Syncing the CIB from router2 to the rest of the cluster
Dec 27 00:24:37 router2 crmd: [25786]: info: te_connect_stonith: 
Attempting connection to fencing daemon...
Dec 27 00:24:37 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_sync for section 'all' (origin=local/crmd/18, 
version=0.135.0): ok (rc=0)
Dec 27 00:24:38 router2 crmd: [25786]: info: te_connect_stonith: Connected
Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section nodes 
(origin=local/crmd/19, version=0.135.0): ok (rc=0)
Dec 27 00:24:38 router2 crmd: [25786]: info: update_attrd: Connecting to 
attrd...
Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_delete for section 
//node_state[@uname='router2']/transient_attributes 
(origin=local/crmd/20, version=0.135.0): ok (rc=0)
Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_delete for section 
//node_state[@uname='router2']/lrm (origin=local/crmd/21, 
version=0.135.0): ok (rc=0)
Dec 27 00:24:38 router2 attrd: [25784]: info: find_hash_entry: Creating 
hash entry for terminate
Dec 27 00:24:38 router2 attrd: [25784]: info: find_hash_entry: Creating 
hash entry for shutdown
Dec 27 00:24:38 router2 crmd: [25786]: info: erase_xpath_callback: 
Deletion of "//node_state[@uname='router2']/transient_attributes": ok (rc=0)
Dec 27 00:24:38 router2 crmd: [25786]: info: erase_xpath_callback: 
Deletion of "//node_state[@uname='router2']/lrm": ok (rc=0)
Dec 27 00:24:38 router2 crmd: [25786]: info: do_dc_join_ack: join-1: 
Updating node state to member for router2
Dec 27 00:24:38 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_delete for section 
//node_state[@uname='router2']/lrm (origin=local/crmd/22, 
version=0.135.0): ok (rc=0)
Dec 27 00:24:39 router2 crmd: [25786]: info: erase_xpath_callback: 
Deletion of "//node_state[@uname='router2']/lrm": ok (rc=0)
Dec 27 00:24:39 router2 crmd: [25786]: info: do_state_transition: State 
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Dec 27 00:24:39 router2 crmd: [25786]: info: do_state_transition: All 1 
cluster nodes are eligible to run resources.
Dec 27 00:24:39 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section nodes 
(origin=local/crmd/24, version=0.135.1): ok (rc=0)
Dec 27 00:24:39 router2 crmd: [25786]: info: do_dc_join_final: Ensuring 
DC, quorum and node attributes are up-to-date
Dec 27 00:24:39 router2 crmd: [25786]: info: crm_update_quorum: Updating 
quorum status to false (call=26)
Dec 27 00:24:39 router2 attrd: [25784]: info: attrd_local_callback: 
Sending full refresh (origin=crmd)
Dec 27 00:24:39 router2 cib: [25782]: info: log_data_element: cib:diff: 
- <cib admin_epoch="0" epoch="135" num_updates="1" />
Dec 27 00:24:39 router2 crmd: [25786]: info: abort_transition_graph: 
do_te_invoke:190 - Triggered transition abort (complete=1) : Peer Cancelled
Dec 27 00:24:39 router2 attrd: [25784]: info: attrd_trigger_update: 
Sending flush op to all hosts for: terminate (<null>)
Dec 27 00:24:39 router2 cib: [25782]: info: log_data_element: cib:diff: 
+ <cib dc-uuid="router2" admin_epoch="0" epoch="136" num_updates="1" />
Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke: Query 27: 
Requesting the current CIB: S_POLICY_ENGINE
Dec 27 00:24:39 router2 cib: [25782]: info: cib_process_request: 
Operation complete: op cib_modify for section cib (origin=local/crmd/26, 
version=0.136.1): ok (rc=0)
Dec 27 00:24:39 router2 crmd: [25786]: info: abort_transition_graph: 
need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Dec 27 00:24:39 router2 attrd: [25784]: info: attrd_trigger_update: 
Sending flush op to all hosts for: shutdown (<null>)
Dec 27 00:24:39 router2 crmd: [25786]: info: need_abort: Aborting on 
change to admin_epoch
Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke_callback: 
Invoking the PE: ref=pe_calc-dc-1261891479-7, seq=2796, quorate=0
Dec 27 00:24:39 router2 pengine: [25785]: notice: unpack_config: On loss 
of CCM Quorum: Ignore
Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke: Query 28: 
Requesting the current CIB: S_POLICY_ENGINE
Dec 27 00:24:39 router2 pengine: [25785]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Dec 27 00:24:39 router2 crmd: [25786]: info: do_pe_invoke_callback: 
Invoking the PE: ref=pe_calc-dc-1261891479-8, seq=2796, quorate=0
Dec 27 00:24:39 router2 cib: [26644]: info: write_cib_contents: Archived 
previous version as /var/lib/heartbeat/crm/cib-33.raw
Dec 27 00:24:39 router2 pengine: [25785]: info: determine_online_status: 
Node router2 is online
Dec 27 00:24:39 router2 pengine: [25785]: notice: group_print: Resource 
Group: router_cluster

Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
extIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
intIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
resRoute    (ocf::heartbeat:Route): Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
firewall    (lsb:shorewall):        Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
extIP       (router2)
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
intIP       (router2)
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
resRoute    (router2)
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
firewall    (router2)

Dec 27 00:24:39 router2 crmd: [25786]: info: handle_response: pe_calc 
calculation pe_calc-dc-1261891479-7 is obsolete
Dec 27 00:24:39 router2 cib: [26644]: info: write_cib_contents: Wrote 
version 0.136.0 of the CIB to disk (digest: 
9e93522d5931f5831f51236994e76c0f)
Dec 27 00:24:39 router2 pengine: [25785]: info: process_pe_message: 
Transition 0: PEngine Input stored in: /var/lib/pengine/pe-input-31.bz2
Dec 27 00:24:39 router2 pengine: [25785]: notice: unpack_config: On loss 
of CCM Quorum: Ignore
Dec 27 00:24:39 router2 pengine: [25785]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Dec 27 00:24:39 router2 pengine: [25785]: info: determine_online_status: 
Node router2 is online
Dec 27 00:24:39 router2 pengine: [25785]: notice: group_print: Resource 
Group: router_cluster
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
extIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
intIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
resRoute    (ocf::heartbeat:Route): Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: native_print:     
firewall    (lsb:shorewall):        Stopped
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
extIP       (router2)
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
intIP       (router2)
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
resRoute    (router2)
Dec 27 00:24:39 router2 pengine: [25785]: notice: LogActions: Start 
firewall    (router2)
Dec 27 00:24:39 router2 crmd: [25786]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Dec 27 00:24:39 router2 cib: [26644]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.ciJ5oX (digest: 
/var/lib/heartbeat/crm/cib.zAvXPr)
Dec 27 00:24:39 router2 crmd: [25786]: info: unpack_graph: Unpacked 
transition 1: 19 actions in 19 synapses
Dec 27 00:24:39 router2 crmd: [25786]: info: do_te_invoke: Processing 
graph 1 (ref=pe_calc-dc-1261891479-8) derived from 
/var/lib/pengine/pe-input-32.bz2
Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 4: monitor extIP_monitor_0 on router2 (local)
Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing 
key=4:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=extIP_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:extIP:2: monitor
Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 5: monitor intIP_monitor_0 on router2 (local)
Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing 
key=5:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=intIP_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:intIP:3: monitor
Dec 27 00:24:39 router2 pengine: [25785]: info: process_pe_message: 
Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-32.bz2
Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 6: monitor resRoute_monitor_0 on router2 (local)
Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing 
key=6:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=resRoute_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:resRoute:4: monitor
Dec 27 00:24:39 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 7: monitor firewall_monitor_0 on router2 (local)
Dec 27 00:24:39 router2 Route[26705]: ERROR: Gateway address 
24.227.124.157 is unreachable.
Dec 27 00:24:39 router2 lrmd: [25783]: notice: lrmd_rsc_new(): No 
lrm_rprovider field in message
Dec 27 00:24:39 router2 lrmd: [25783]: info: RA output: 
(resRoute:monitor:stderr) 2009/12/27_00:24:39 ERROR: Gateway address 
24.227.124.157 is unreachable.
Dec 27 00:24:39 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing 
key=7:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=firewall_monitor_0 )
Dec 27 00:24:39 router2 lrmd: [25783]: info: rsc:firewall:5: monitor

Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation extIP_monitor_0 (call=2, rc=7, cib-update=29, confirmed=true) 
complete not running
Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation intIP_monitor_0 (call=3, rc=7, cib-update=30, confirmed=true) 
complete not running
Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation resRoute_monitor_0 (call=4, rc=5, cib-update=31, 
confirmed=true) complete not installed
Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation firewall_monitor_0 (call=5, rc=7, cib-update=32, 
confirmed=true) complete not running
Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action 
extIP_monitor_0 (4) confirmed on router2 (rc=0)
Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action 
intIP_monitor_0 (5) confirmed on router2 (rc=0)
Dec 27 00:24:40 router2 crmd: [25786]: WARN: status_from_rc: Action 6 
(resRoute_monitor_0) on router2 failed (target: 7 vs. rc: 5): Error
Dec 27 00:24:40 router2 crmd: [25786]: info: abort_transition_graph: 
match_graph_event:272 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=resRoute_monitor_0, 
magic=0:5;6:1:7:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57, cib=0.136.4) : 
Event failed

Dec 27 00:24:40 router2 crmd: [25786]: info: update_abort_priority: 
Abort priority upgraded from 0 to 1
Dec 27 00:24:40 router2 crmd: [25786]: info: update_abort_priority: 
Abort action done superceeded by restart
Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action 
resRoute_monitor_0 (6) confirmed on router2 (rc=4)
Dec 27 00:24:40 router2 crmd: [25786]: info: match_graph_event: Action 
firewall_monitor_0 (7) confirmed on router2 (rc=0)
Dec 27 00:24:40 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 3: probe_complete probe_complete on router2 (local) - no waiting
Dec 27 00:24:40 router2 crmd: [25786]: info: run_graph: 
====================================================
Dec 27 00:24:40 router2 attrd: [25784]: info: find_hash_entry: Creating 
hash entry for probe_complete
Dec 27 00:24:40 router2 crmd: [25786]: notice: run_graph: Transition 1 
(Complete=5, Pending=0, Fired=0, Skipped=14, Incomplete=0, 
Source=/var/lib/pengine/pe-input-32.bz2): Stopped
Dec 27 00:24:40 router2 attrd: [25784]: info: attrd_trigger_update: 
Sending flush op to all hosts for: probe_complete (true)
Dec 27 00:24:40 router2 crmd: [25786]: info: te_graph_trigger: 
Transition 1 is now complete
Dec 27 00:24:40 router2 attrd: [25784]: info: attrd_perform_update: Sent 
update 8: probe_complete=true
Dec 27 00:24:40 router2 crmd: [25786]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 27 00:24:40 router2 crmd: [25786]: info: do_state_transition: All 1 
cluster nodes are eligible to run resources.
Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke: Query 33: 
Requesting the current CIB: S_POLICY_ENGINE
Dec 27 00:24:40 router2 crmd: [25786]: info: abort_transition_graph: 
te_update_diff:146 - Triggered transition abort (complete=1, 
tag=transient_attributes, id=router2, magic=NA, cib=0.136.6) : Transient 
attribute: update
Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke_callback: 
Invoking the PE: ref=pe_calc-dc-1261891480-14, seq=2796, quorate=0
Dec 27 00:24:40 router2 pengine: [25785]: notice: unpack_config: On loss 
of CCM Quorum: Ignore
Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke: Query 34: 
Requesting the current CIB: S_POLICY_ENGINE
Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Dec 27 00:24:40 router2 crmd: [25786]: info: do_pe_invoke_callback: 
Invoking the PE: ref=pe_calc-dc-1261891480-15, seq=2796, quorate=0
Dec 27 00:24:40 router2 pengine: [25785]: info: determine_online_status: 
Node router2 is online

Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_rsc_op: 
resRoute_monitor_0 on router2 returned 5 (not installed) instead of the 
expected value: 7 (not running)
Dec 27 00:24:40 router2 pengine: [25785]: ERROR: unpack_rsc_op: Hard 
error - resRoute_monitor_0 failed with rc=5: Preventing resRoute from 
re-starting on router2

Dec 27 00:24:40 router2 pengine: [25785]: notice: group_print: Resource 
Group: router_cluster
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
extIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
intIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
resRoute    (ocf::heartbeat:Route): Stopped
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
firewall    (lsb:shorewall):        Stopped
Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: 
extIP: Rolling back scores from resRoute
Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: 
intIP: Rolling back scores from resRoute
Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: 
resRoute: Rolling back scores from firewall
Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource 
resRoute cannot run anywhere
Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource 
firewall cannot run anywhere
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start 
extIP       (router2)
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start 
intIP       (router2)
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave 
resource resRoute   (Stopped)
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave 
resource firewall   (Stopped)
Dec 27 00:24:40 router2 crmd: [25786]: info: handle_response: pe_calc 
calculation pe_calc-dc-1261891480-14 is obsolete
Dec 27 00:24:40 router2 pengine: [25785]: WARN: process_pe_message: 
Transition 2: WARNINGs found during PE processing. PEngine Input stored 
in: /var/lib/pengine/pe-warn-27.bz2
Dec 27 00:24:40 router2 pengine: [25785]: info: process_pe_message: 
Configuration WARNINGs found during PE processing.  Please run 
"crm_verify -L" to identify issues.
Dec 27 00:24:40 router2 pengine: [25785]: notice: unpack_config: On loss 
of CCM Quorum: Ignore
Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Dec 27 00:24:40 router2 pengine: [25785]: info: determine_online_status: 
Node router2 is online
Dec 27 00:24:40 router2 pengine: [25785]: info: unpack_rsc_op: 
resRoute_monitor_0 on router2 returned 5 (not installed) instead of the 
expected value: 7 (not running)
Dec 27 00:24:40 router2 pengine: [25785]: ERROR: unpack_rsc_op: Hard 
error - resRoute_monitor_0 failed with rc=5: Preventing resRoute from 
re-starting on router2
Dec 27 00:24:40 router2 pengine: [25785]: notice: group_print: Resource 
Group: router_cluster
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
extIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
intIP       (ocf::heartbeat:IPaddr2):       Stopped
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
resRoute    (ocf::heartbeat:Route): Stopped
Dec 27 00:24:40 router2 pengine: [25785]: notice: native_print:     
firewall    (lsb:shorewall):        Stopped
Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: 
extIP: Rolling back scores from resRoute
Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: 
intIP: Rolling back scores from resRoute
Dec 27 00:24:40 router2 pengine: [25785]: info: native_merge_weights: 
resRoute: Rolling back scores from firewall
Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource 
resRoute cannot run anywhere
Dec 27 00:24:40 router2 pengine: [25785]: WARN: native_color: Resource 
firewall cannot run anywhere
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start 
extIP       (router2)
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Start 
intIP       (router2)
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave 
resource resRoute   (Stopped)
Dec 27 00:24:40 router2 pengine: [25785]: notice: LogActions: Leave 
resource firewall   (Stopped)
Dec 27 00:24:40 router2 crmd: [25786]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Dec 27 00:24:40 router2 crmd: [25786]: info: unpack_graph: Unpacked 
transition 3: 3 actions in 3 synapses
Dec 27 00:24:40 router2 pengine: [25785]: WARN: process_pe_message: 
Transition 3: WARNINGs found during PE processing. PEngine Input stored 
in: /var/lib/pengine/pe-warn-28.bz2
Dec 27 00:24:40 router2 crmd: [25786]: info: do_te_invoke: Processing 
graph 3 (ref=pe_calc-dc-1261891480-15) derived from 
/var/lib/pengine/pe-warn-28.bz2
Dec 27 00:24:40 router2 pengine: [25785]: info: process_pe_message: 
Configuration WARNINGs found during PE processing.  Please run 
"crm_verify -L" to identify issues.
Dec 27 00:24:40 router2 crmd: [25786]: info: te_pseudo_action: Pseudo 
action 6 fired and confirmed
Dec 27 00:24:40 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 4: start extIP_start_0 on router2 (local)
Dec 27 00:24:40 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing 
key=4:3:0:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=extIP_start_0 )
Dec 27 00:24:40 router2 lrmd: [25783]: info: rsc:extIP:6: start
Dec 27 00:24:40 router2 IPaddr2[26740]: INFO: ip -f inet addr add 
24.227.124.158/30 brd 24.227.124.159 dev net0
Dec 27 00:24:40 router2 lrmd: [25783]: info: RA output: 
(extIP:start:stderr) 2009/12/27_00:24:40 INFO: ip -f inet addr add 
24.227.124.158/30 brd 24.227.124.159 dev net0
Dec 27 00:24:40 router2 IPaddr2[26740]: INFO: ip link set net0 up
Dec 27 00:24:40 router2 lrmd: [25783]: info: RA output: 
(extIP:start:stderr) 2009/12/27_00:24:40 INFO: ip link set net0 up
Dec 27 00:24:40 router2 IPaddr2[26740]: INFO: 
/usr/lib/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/heartbeat/rsctmp/send_arp/send_arp-24.227.124.158 net0 
24.227.124.158 auto not_used not_used
Dec 27 00:24:40 router2 kernel: net0: link down
Dec 27 00:24:40 router2 lrmd: [25783]: info: RA output: 
(extIP:start:stderr) 2009/12/27_00:24:40 INFO: 
/usr/lib/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/heartbeat/rsctmp/send_arp/send_arp-24.227.124.158 net0 
24.227.124.158 auto not_used not_used

Dec 27 00:24:40 router2 kernel: ADDRCONF(NETDEV_UP): net0: link is not ready

Dec 27 00:24:41 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation extIP_start_0 (call=6, rc=0, cib-update=35, confirmed=true) 
complete ok
Dec 27 00:24:41 router2 crmd: [25786]: info: match_graph_event: Action 
extIP_start_0 (4) confirmed on router2 (rc=0)
Dec 27 00:24:41 router2 crmd: [25786]: info: te_rsc_command: Initiating 
action 5: start intIP_start_0 on router2 (local)
Dec 27 00:24:41 router2 crmd: [25786]: info: do_lrm_rsc_op: Performing 
key=5:3:0:804c8e68-f81f-4d16-b9ab-6c6a4ea41b57 op=intIP_start_0 )
Dec 27 00:24:41 router2 lrmd: [25783]: info: rsc:intIP:7: start
Dec 27 00:24:41 router2 IPaddr2[26795]: INFO: ip -f inet addr add 
192.168.0.1/16 brd 192.168.255.255 dev lan0
Dec 27 00:24:41 router2 lrmd: [25783]: info: RA output: 
(intIP:start:stderr) 2009/12/27_00:24:41 INFO: ip -f inet addr add 
192.168.0.1/16 brd 192.168.255.255 dev lan0
Dec 27 00:24:41 router2 IPaddr2[26795]: INFO: ip link set lan0 up
Dec 27 00:24:41 router2 lrmd: [25783]: info: RA output: 
(intIP:start:stderr) 2009/12/27_00:24:41 INFO: ip link set lan0 up
Dec 27 00:24:41 router2 IPaddr2[26795]: INFO: 
/usr/lib/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.1 lan0 192.168.0.1 
auto not_used not_used
Dec 27 00:24:41 router2 lrmd: [25783]: info: RA output: 
(intIP:start:stderr) 2009/12/27_00:24:41 INFO: 
/usr/lib/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.0.1 lan0 192.168.0.1 
auto not_used not_used
Dec 27 00:24:41 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation intIP_start_0 (call=7, rc=0, cib-update=36, confirmed=true) 
complete ok
Dec 27 00:24:41 router2 crmd: [25786]: info: match_graph_event: Action 
intIP_start_0 (5) confirmed on router2 (rc=0)
Dec 27 00:24:41 router2 crmd: [25786]: info: run_graph: 
====================================================
Dec 27 00:24:41 router2 crmd: [25786]: notice: run_graph: Transition 3 
(Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-warn-28.bz2): Complete
Dec 27 00:24:41 router2 crmd: [25786]: info: te_graph_trigger: 
Transition 3 is now complete
Dec 27 00:24:41 router2 crmd: [25786]: info: notify_crmd: Transition 3 
status: done - <null>
Dec 27 00:24:41 router2 crmd: [25786]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 27 00:24:41 router2 crmd: [25786]: info: do_state_transition: 
Starting PEngine Recheck Timer

I have added blank lines in the logs to separate out specific event 
segments that shows it. One in particular, near the top specifically is 
what's causing the entire resRoute to fail completely:

Dec 27 00:24:40 router2 crmd: [25786]: info: process_lrm_event: LRM 
operation resRoute_monitor_0 (call=4, rc=5, cib-update=31, 
confirmed=true) complete not installed

This is with OpenSUSE 11.1 with the ha-cluster repository used with 
pacemaker 1.0.5, cluster-glue 1.0, heartbeat 3.0.0, openais 0.80.5, 
ha-resources 1.0 (which is heartbeat 3.99.x stuff I do believe). So 
fairly current versions now.

I'd been making my setup off of susestudio and hand picking the packages 
needed.

Any thoughts?

--
Eric Renfro





More information about the Pacemaker mailing list