[Pacemaker] Failed start of a resource after a Debian upgrading

Michal Vyoral michal.vyoral at chmi.cz
Wed Jan 25 10:35:39 EST 2012


Hi Dejan,

On Tue, Jan 24, 2012 at 11:52:20PM +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Tue, Jan 24, 2012 at 06:31:54PM +0100, Michal Vyoral wrote:
> > Hello,
> > we had a cluster of two nodes both running Debian 5.0, each with two resources
> > IPaddr2 and apache managed by pacemaker 1.0.9.1. After an upgrading of
> > one node from Debian 5.0 to 6.0 we have a problem to start the
> > apache resource on the upgraded node. Here are the details:
> > 
> > Versions of heartbeat and pacemaker before the upgrade:
> > pr-iso1:~# dpkg -l pacemaker heartbeat
> > Desired=Unknown/Install/Remove/Purge/Hold
> > | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
> > |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
> > ||/ Name           Version        Description
> > +++-==============-==============-============================================
> > ii  heartbeat      1:3.0.3-2~bpo5 Subsystem for High-Availability Linux
> > ii  pacemaker      1.0.9.1+hg1562 HA cluster resource manager
> > 
> > Versions of heartbeat and pacemaker after the upgrade:
> > pr-iso2:~# dpkg -l pacemaker heartbeat
> > Desired=Unknown/Install/Remove/Purge/Hold
> > | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> > ||/ Name           Version        Description
> > +++-==============-==============-============================================
> > ii  heartbeat      1:3.0.3-2      Subsystem for High-Availability Linux
> > ii  pacemaker      1.0.9.1+hg1562 HA cluster resource manager
> > 
> > Status of the resources on the upgraded node:
> > pr-iso2:~# crm_mon
> > ============
> > Last updated: Tue Jan 24 10:14:12 2012
> > Stack: Heartbeat
> > Current DC: pr-iso2 (511079a9-0f71-4537-bdf9-07714b454441) - partition with quorum
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pr-iso2 ]
> > OFFLINE: [ pr-iso1 ]
> > 
> > ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso2
> > 
> > Failed actions:
> >     RTWeb_start_0 (node=pr-iso2, call=7, rc=1, status=complete): unknown error
> > 
> > Status of the resources on the non upgraded node:
> > pr-iso1:~# crm_mon
> > ============
> > Last updated: Tue Jan 24 17:08:22 2012
> > Stack: Heartbeat
> > Current DC: pr-iso1 (014268aa-f234-4789-b4a1-0053cf4e61b9) - partition with quor
> > um
> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> > 2 Nodes configured, unknown expected votes
> > 2 Resources configured.
> > ============
> > 
> > Online: [ pr-iso1 pr-iso2 ]
> > 
> > ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso1
> > RTWeb   (ocf::heartbeat:apache):        Started pr-iso1
> > 
> > Configuration of the resources:
> > pr-iso1:~# crm configure show
> > node $id="014268aa-f234-4789-b4a1-0053cf4e61b9" pr-iso1
> > node $id="511079a9-0f71-4537-bdf9-07714b454441" pr-iso2
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >         params ip="10.5.75.83" cidr_netmask="24" \
> >         op monitor interval="30s"
> > primitive RTWeb ocf:heartbeat:apache \
> >         params configfile="/etc/apache2/apache2.conf" \
> >         op monitor interval="1min" \
> >         meta target-role="Started" is-managed="true"
> > colocation website-with-ip inf: RTWeb ClusterIP
> > order rtweb_after_clustrip inf: ClusterIP RTWeb
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> >         cluster-infrastructure="Heartbeat" \
> >         stonith-enabled="false" \
> >         last-lrm-refresh="1327399494"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> > 
> > Records in the /var/log/ha-log related to RTWeb resource:
> > pr-iso2:~# grep RTWeb /var/log/ha-log
> > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing key=7:76:7:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_monitor_0 )
> > Jan 24 10:04:56 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:4: probe
> > Jan 24 10:04:56 pr-iso2 crmd: [6130]: info: process_lrm_event: LRM operation RTWeb_monitor_0 (call=4, rc=7, cib-update=13, confirmed=true) not running
> > Jan 24 10:12:48 pr-iso2 crmd: [6130]: info: do_lrm_rsc_op: Performing key=11:77:0:41cbad9d-9090-4aba-bd6a-bf171077c74b op=RTWeb_start_0 )
> > Jan 24 10:12:48 pr-iso2 lrmd: [6127]: info: rsc:RTWeb:7: start
> 
> After this message there should be a bit more (look for "apache"
> or "lrmd"). Next resource agents are going to log the resource
> name too (RTWeb in this case). If you cannot find anything here,
> then the answer must be in the apache logs.
> 
> Thanks,
> 
> Dejan

Yes, you are right: here are two more lines after the previous line:

  apache[9454]:   2012/01/24_10:12:49 INFO: apache not running
  apache[9454]:   2012/01/24_10:12:49 INFO: waiting for apache /etc/apache2/apache2.conf to come up

There are no records in /var/log/apache2/error.log giving some clue, see:

  pr-iso2:/var/log/apache2# cat error.log
  [Tue Jan 24 11:12:50 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
  [Tue Jan 24 11:13:08 2012] [notice] caught SIGTERM, shutting down
  [Wed Jan 25 13:09:02 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
  [Wed Jan 25 13:09:21 2012] [notice] caught SIGTERM, shutting down

See the interesting thing: our nodes shold use UTC time, but after the upgrade
we have noticed, that the time on the upgraded node is our local time (= UTC + 1)
I have return the system time back to UTC, but Apache still uses the local time in the log. 

We have tried to start the Apache on the upgraded node alone:

1. we have modified the file /etc/apache2/ports2.conf to
Apache listen on the physical address
2. we have run the command '/etc/init.d/apache2 start'
3. we have download an index.html page

Here is the record in the error log:

 [Wed Jan 25 13:28:11 2012] [notice] Apache/2.2.16 (Debian) PHP/5.3.3-7+squeeze3 with Suhosin-Patch mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
 [Wed Jan 25 13:28:27 2012] [warn] [client 10.5.77.29] incomplete redirection target of '/rt/' for URI '/' modified to 'http://10.5.75.82/rt/'

So, Apache alone could run. 

Before the upgrade we have made some minor changes to apache2.conf
on the active node, but not on the passive node. We have return 
the changes back, but the resource is stil failed, see the tail from th ha-log
on the upgraded node:

Jan 25 14:04:11 pr-iso2 heartbeat: [16026]: WARN: node pr-iso1: is dead
Jan 25 14:04:11 pr-iso2 heartbeat: [16026]: info: Link pr-iso1:eth0 dead.
Jan 25 14:04:11 pr-iso2 crmd: [16043]: notice: crmd_ha_status_callback: Status update: Node pr-iso1 now has status [dead] (DC=false)
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crm_update_peer_proc: pr-iso1.ais is now offline
Jan 25 14:04:11 pr-iso2 ccm: [16038]: info: Break tie for 2 nodes cluster
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: no mbr_track info
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: no mbr_track info
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=3)
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Jan 25 14:04:11 pr-iso2 cib: [16039]: info: crm_update_peer: Node pr-iso1: id=0 state=lost (new) addr=(null) votes=-1 born=1 seen=2 proc=00000000000000000000
000000000202
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=3)
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail: NEW MEMBERSHIP: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail:  CURRENT: pr-iso2 [nodeid=1, born=3]
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: ccm_event_detail:  LOST:    pr-iso1 [nodeid=0, born=1]
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: crm_update_peer: Node pr-iso1: id=0 state=lost (new) addr=(null) votes=-1 born=1 seen=2 proc=0000000000000000000
0000000000000
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL ori
gin=do_election_check ]
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: do_te_control: Registering TE UUID: 6fe179f3-2f76-4b99-b53a-b449b0bf3008
Jan 25 14:04:11 pr-iso2 crmd: [16043]: WARN: cib_client_add_notify_callback: Callback already present
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: set_graph_functions: Setting custom graph functions
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition -1: 0 actions in 0 synapses
Jan 25 14:04:11 pr-iso2 crmd: [16043]: info: start_subsystem: Starting sub-system "pengine"
Jan 25 14:04:11 pr-iso2 pengine: [16392]: info: Invoked: /usr/lib/heartbeat/pengine 
Jan 25 14:04:11 pr-iso2 pengine: [16392]: info: main: Starting pengine
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: do_dc_takeover: Taking over DC status for this partition
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_readwrite: We are now in R/W mode
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/17, version=0.63.9): 
ok (rc=0)
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/18, version=0.63.9): ok
 (rc=0)
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/20, version=0.63
.9): ok (rc=0)
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: join_make_offer: Making join offers based on membership 3
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
Jan 25 14:04:14 pr-iso2 crmd: [16043]: info: te_connect_stonith: Attempting connection to fencing daemon...
Jan 25 14:04:14 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/22, version=0.63.9): ok (rc=0)
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: te_connect_stonith: Connected
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: config_query_callback: Checking for expired actions every 900000ms
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: update_dc: Set DC to pr-iso2 (3.0.1)
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster nodes responded to the join offer.
Jan 25 14:04:15 pr-iso2 crmd: [16043]: info: do_dc_join_finalize: join-1: Syncing the CIB from pr-iso2 to the rest of the cluster
Jan 25 14:04:15 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/25, version=0.63.9): ok (rc=0)
Jan 25 14:04:15 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/26, version=0.63.9): ok (rc=0)
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: do_dc_join_ack: join-1: Updating node state to member for pr-iso2
Jan 25 14:04:17 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='pr-iso2']/lrm (origin=local/crmd/27, version=0.63.10): ok (rc=0)
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: erase_xpath_callback: Deletion of "//node_state[@uname='pr-iso2']/lrm": ok (rc=0)
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jan 25 14:04:17 pr-iso2 crmd: [16043]: info: populate_cib_nodes_ha: Requesting the list of configured nodes
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_dc_join_final: Ensuring DC, quorum and node attributes are up-to-date
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: crm_update_quorum: Updating quorum status to true (call=31)
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_local_callback: Sending full refresh (origin=crmd)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 32: Requesting the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/29, version=0.63.11): ok (rc=0)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: WARN: match_down_event: No match for shutdown action on 014268aa-f234-4789-b4a1-0053cf4e61b9
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: te_update_diff: Stonith/shutdown of 014268aa-f234-4789-b4a1-0053cf4e61b9 not matched
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:191 - Triggered transition abort (complete=1, tag=node_state, id=014268aa-f234-4789-b4a1-0053cf4e61b9, magic=NA, cib=0.63.12) : Node failure
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 33: Requesting the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: log_data_element: cib:diff: - <cib dc-uuid="014268aa-f234-4789-b4a1-0053cf4e61b9" admin_epoch="0" epoch="63" num_updates="12" />
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: log_data_element: cib:diff: + <cib dc-uuid="511079a9-0f71-4537-bdf9-07714b454441" admin_epoch="0" epoch="64" num_updates="1" />
Jan 25 14:04:18 pr-iso2 cib: [16039]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/31, version=0.64.1): ok (rc=0)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: need_abort: Aborting on change to admin_epoch
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 34: Requesting the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-RTWeb (INFINITY)
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 22: fail-count-RTWeb=INFINITY
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-RTWeb (1327500241)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking the PE: query=34, ref=pe_calc-dc-1327500258-12, seq=3, quorate=1
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 25: last-failure-RTWeb=1327500241
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: determine_online_status: Node pr-iso2 is online
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing failed op RTWeb_start_0 on pr-iso2: unknown error (1)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso2
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, cib=0.64.2) : Transient attribute: update
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: RTWeb   (ocf::heartbeat:apache):        Started pr-iso2 FAILED
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, cib=0.64.3) : Transient attribute: update
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: RecurringOp:  Start recurring monitor (60s) for RTWeb on pr-iso2
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 35: Requesting the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource ClusterIP  (Started pr-iso2)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Recover resource RTWeb    (Started pr-iso2)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 36: Requesting the current CIB: S_POLICY_ENGINE
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: handle_response: pe_calc calculation pe_calc-dc-1327500258-12 is obsolete
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking the PE: query=36, ref=pe_calc-dc-1327500258-13, seq=3, quorate=1
Jan 25 14:04:18 pr-iso2 cib: [16405]: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-90.raw
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 0: PEngine Input stored in: /var/lib/pengine/pe-input-6324.bz2
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: determine_online_status: Node pr-iso2 is online
Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing failed op RTWeb_start_0 on pr-iso2: unknown error (1)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso2
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: native_print: RTWeb   (ocf::heartbeat:apache):        Started pr-iso2 FAILED
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed INFINITY times on pr-iso2
Jan 25 14:04:18 pr-iso2 pengine: [16392]: WARN: common_apply_stickiness: Forcing RTWeb away from pr-iso2 after 1000000 failures (max=1000000)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: native_merge_weights: ClusterIP: Rolling back scores from RTWeb
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: native_color: Resource RTWeb cannot run anywhere
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource ClusterIP  (Started pr-iso2)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: notice: LogActions: Stop resource RTWeb       (pr-iso2)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 1: 2 actions in 2 synapses
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_te_invoke: Processing graph 1 (ref=pe_calc-dc-1327500258-13) derived from /var/lib/pengine/pe-input-6325.bz2
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: te_rsc_command: Initiating action 1: stop RTWeb_stop_0 on pr-iso2 (local)
Jan 25 14:04:18 pr-iso2 crmd: [16043]: info: do_lrm_rsc_op: Performing key=1:1:0:6fe179f3-2f76-4b99-b53a-b449b0bf3008 op=RTWeb_stop_0 )
Jan 25 14:04:18 pr-iso2 lrmd: [16040]: info: rsc:RTWeb:8: stop
Jan 25 14:04:18 pr-iso2 cib: [16405]: info: write_cib_contents: Wrote version 0.64.0 of the CIB to disk (digest: b1b8107c826e0e9f9d7e0dc5d7f17d77)
Jan 25 14:04:18 pr-iso2 cib: [16405]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.0nYAMe (digest: /var/lib/heartbeat/crm/cib.2WriED)
Jan 25 14:04:18 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-6325.bz2
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_ha_callback: flush message from pr-iso2
Jan 25 14:04:18 pr-iso2 attrd: [16042]: info: attrd_ha_callback: flush message from pr-iso2
Jan 25 14:04:19 pr-iso2 lrmd: [16040]: info: RA output: (RTWeb:stop:stderr) kill: 830: No such process
apache[16406]:  2012/01/25_14:04:19 INFO: Killing apache PID 16356
apache[16406]:  2012/01/25_14:04:19 INFO: apache stopped.
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation RTWeb_stop_0 (call=8, rc=0, cib-update=37, confirmed=true) ok
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: match_graph_event: Action RTWeb_stop_0 (1) confirmed on pr-iso2 (rc=0)
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: te_pseudo_action: Pseudo action 3 fired and confirmed
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: run_graph: ====================================================
Jan 25 14:04:19 pr-iso2 crmd: [16043]: notice: run_graph: Transition 1 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-6325.bz2): Complete
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: te_graph_trigger: Transition 1 is now complete
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: notify_crmd: Transition 1 status: done - <null>
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 14:04:19 pr-iso2 crmd: [16043]: info: do_state_transition: Starting PEngine Recheck Timer
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/crmd process group 16043 with signal 15
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crm_shutdown: Requesting shutdown
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown_req: Sending shutdown request to DC: pr-iso2
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: handle_shutdown_request: Creating shutdown request for pr-iso2 (state=S_POLICY_ENGINE)
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1327500299)
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_perform_update: Sent update 33: shutdown=1327500299
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=1, tag=transient_attributes, id=511079a9-0f71-4537-bdf9-07714b454441, magic=NA, cib=0.64.6) : Transient attribute: update
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_pe_invoke: Query 39: Requesting the current CIB: S_POLICY_ENGINE
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_pe_invoke_callback: Invoking the PE: query=39, ref=pe_calc-dc-1327500299-16, seq=3, quorate=1
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: determine_online_status: Node pr-iso2 is shutting down
Jan 25 14:04:59 pr-iso2 pengine: [16392]: WARN: unpack_rsc_op: Processing failed op RTWeb_start_0 on pr-iso2: unknown error (1)
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: native_print: ClusterIP       (ocf::heartbeat:IPaddr2):       Started pr-iso2
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: native_print: RTWeb   (ocf::heartbeat:apache):        Stopped 
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: get_failcount: RTWeb has failed INFINITY times on pr-iso2
Jan 25 14:04:59 pr-iso2 pengine: [16392]: WARN: common_apply_stickiness: Forcing RTWeb away from pr-iso2 after 1000000 failures (max=1000000)
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_merge_weights: ClusterIP: Rolling back scores from RTWeb
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_color: Resource ClusterIP cannot run anywhere
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: native_color: Resource RTWeb cannot run anywhere
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: stage6: Scheduling Node pr-iso2 for shutdown
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: LogActions: Stop resource ClusterIP   (pr-iso2)
Jan 25 14:04:59 pr-iso2 pengine: [16392]: notice: LogActions: Leave resource RTWeb      (Stopped)
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: unpack_graph: Unpacked transition 2: 3 actions in 3 synapses
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1327500299-16) derived from /var/lib/pengine/pe-input-6326.bz2
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_rsc_command: Initiating action 5: stop ClusterIP_stop_0 on pr-iso2 (local)
Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: cancel_op: operation monitor[6] on ocf::IPaddr2::ClusterIP for client 16043, its parameters: CRM_meta_name=[monitor] cidr_netmask=[24] crm_feature_set=[3.0.1] CRM_meta_timeout=[20000] CRM_meta_interval=[30000] ip=[10.5.75.83]  cancelled
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_lrm_rsc_op: Performing key=5:2:0:6fe179f3-2f76-4b99-b53a-b449b0bf3008 op=ClusterIP_stop_0 )
Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: rsc:ClusterIP:9: stop
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation ClusterIP_monitor_30000 (call=6, status=1, cib-update=0, confirmed=true) Cancelled
IPaddr2[16548]: 2012/01/25_14:04:59 INFO: IP status = ok, IP_CIP=
IPaddr2[16548]: 2012/01/25_14:04:59 INFO: ip -f inet addr delete 10.5.75.83/24 dev eth0
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: process_lrm_event: LRM operation ClusterIP_stop_0 (call=9, rc=0, cib-update=40, confirmed=true) ok
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: match_graph_event: Action ClusterIP_stop_0 (5) confirmed on pr-iso2 (rc=0)
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_pseudo_action: Pseudo action 2 fired and confirmed
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_crm_command: Executing crm-event (8): do_shutdown on pr-iso2
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_crm_command: crm-event (8) is a local shutdown
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: run_graph: ====================================================
Jan 25 14:04:59 pr-iso2 crmd: [16043]: notice: run_graph: Transition 2 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-6326.bz2): Complete
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: te_graph_trigger: Transition 2 is now complete
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_dc_release: DC role released
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to pengine: [16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_control: Transitioner is now inactive
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_te_control: Disconnecting STONITH...
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Jan 25 14:04:59 pr-iso2 crmd: [16043]: notice: Not currently connected.
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Terminating the pengine
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to pengine: [16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Waiting for subsystems to exit
Jan 25 14:04:59 pr-iso2 crmd: [16043]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems stopped, continuing
Jan 25 14:04:59 pr-iso2 crmd: [16043]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Terminating the pengine
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: stop_subsystem: Sent -TERM to pengine: [16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: Waiting for subsystems to exit
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems stopped, continuing
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: process_pe_message: Transition 2: PEngine Input stored in: /var/lib/pengine/pe-input-6326.bz2
Jan 25 14:04:59 pr-iso2 pengine: [16392]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crmdManagedChildDied: Process pengine:[16392] exited (signal=0, exitcode=0)
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: pe_msg_dispatch: Received HUP from pengine:[16392]
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: pe_connection_destroy: Connection to the Policy Engine released
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_shutdown: All subsystems stopped, continuing
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_lrm_control: Disconnected from the LRM
Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: client (pid=16043) removed from ccm
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_ha_control: Disconnected from Heartbeat
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_cib_control: Disconnecting CIB
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_process_readwrite: We are now in R/O mode
Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: send_ipc_message: IPC Channel to 16043 is not connected
Jan 25 14:04:59 pr-iso2 crmd: [16043]: info: do_exit: [crmd] stopped (0)
Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: send_via_callback_channel: Delivery of reply to client 16043/74e07029-b533-4c17-bac2-a0a52149bc7a failed
Jan 25 14:04:59 pr-iso2 cib: [16039]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/attrd process group 16042 with signal 15
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_shutdown: Exiting
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: main: Exiting...
Jan 25 14:04:59 pr-iso2 attrd: [16042]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/stonithd process group 16041 with signal 15
Jan 25 14:04:59 pr-iso2 stonithd: [16041]: notice: /usr/lib/heartbeat/stonithd normally quit.
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/lrmd -r process group 16040 with signal 15
Jan 25 14:04:59 pr-iso2 lrmd: [16040]: info: lrmd is shutting down
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/cib process group 16039 with signal 15
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_shutdown: Disconnected 0 clients
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: cib_process_disconnect: All clients disconnected...
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: terminate_cib: Exiting...
Jan 25 14:04:59 pr-iso2 cib: [16039]: info: main: Done
Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: client (pid=16039) removed from ccm
Jan 25 14:04:59 pr-iso2 heartbeat: [16026]: info: killing /usr/lib/heartbeat/ccm process group 16038 with signal 15
Jan 25 14:04:59 pr-iso2 ccm: [16038]: info: received SIGTERM, going to shut down
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBFIFO process 16031 with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBWRITE process 16032 with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBREAD process 16033 with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBWRITE process 16034 with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: killing HBREAD process 16035 with signal 15
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16033 exited. 5 remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16034 exited. 4 remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16031 exited. 3 remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16035 exited. 2 remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: Core process 16032 exited. 1 remaining
Jan 25 14:05:00 pr-iso2 heartbeat: [16026]: info: pr-iso2 Heartbeat shutdown complete.

May be the solution is to upgrade the active node and reconfigure the pacemaker? Thank you.

Best regards,
Michal Vyoral




More information about the Pacemaker mailing list