[Pacemaker] [Problem]It is judged that a stopping resource is	starting.
    Andrew Beekhof 
    andrew at beekhof.net
       
    Tue Feb 21 01:42:53 UTC 2012
    
    
  
On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661014 at ybb.ne.jp> wrote:
> Hi Andrew,
>
> Thank you for comment.
>
>> I'm getting to this soon, really :-)
>> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> works, then fixing everything I broke when adding corosync 2.0
>> support.
>
> All right!
>
> I wait for your answer.
I somehow missed that the failure was "not configured"
Failed actions:
    prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
configured
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
lists rc=6 as fatal, but I believe we changed that behaviour (the
stopping aspect) in the PE as there was also insufficient information
for the agent to stop the service.
Which results in the node being fenced, the resource being probed,
which fails along with the subsequent stop, then the node is fenced
again, etc.
So two things:
this log message should include the human version of rc=6
Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
re-starting anywhere in the cluster
and the docs need to be updated.
>
> Best Regards,
> Hideo Yamauchi.
>
> --- On Thu, 2012/2/16, Andrew Beekhof <andrew at beekhof.net> wrote:
>
>> Sorry!
>>
>> I'm getting to this soon, really :-)
>> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>> works, then fixing everything I broke when adding corosync 2.0
>> support.
>>
>> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661014 at ybb.ne.jp> wrote:
>> > Hi Andrew,
>> >
>> > About this problem, how did it turn out afterwards?
>> >
>> > Best Regards,
>> > Hideo Yamauchi.
>> >
>> >
>> > --- On Mon, 2012/1/16, renayama19661014 at ybb.ne.jp <renayama19661014 at ybb.ne.jp> wrote:
>> >
>> >> Hi Andrew,
>> >>
>> >> Thank you for comments.
>> >>
>> >> > Could you send me the PE file related to this log please?
>> >> >
>> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> >> > /var/lib/pengine/pe-input-4.bz2
>> >>
>> >> The old file disappeared.
>> >> I send log and the PE file which reappeared in the same procedure.
>> >>
>> >>  * trac1818.zip
>> >>   * https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>> >>
>> >> Best Regards,
>> >> Hideo Yamauchi.
>> >>
>> >>
>> >> --- On Mon, 2012/1/16, Andrew Beekhof <andrew at beekhof.net> wrote:
>> >>
>> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661014 at ybb.ne.jp> wrote:
>> >> > > Hi Andrew,
>> >> > >
>> >> > > Thank you for comment.
>> >> > >
>> >> > >> But it should have a subsequent stop action which would set it back to
>> >> > >> being inactive.
>> >> > >> Did that not happen in this case?
>> >> > >
>> >> > > Yes.
>> >> >
>> >> > Could you send me the PE file related to this log please?
>> >> >
>> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>> >> > /var/lib/pengine/pe-input-4.bz2
>> >> >
>> >> >
>> >> >
>> >> > > Log of "verify_stopped" is only recorded.
>> >> > > The stop handling of resource that failed in probe was not carried out.
>> >> > >
>> >> > > -----------------------------
>> >> > > ######### yamauchi PREV STOP ##########
>> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: Requesting the list of configured nodes
>> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd
>> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting shutdown
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 cluster nodes are eligible to run resources.
>> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending shutdown request to DC: rh57-1
>> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE)
>> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1325845319)
>> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: Sent update 14: shutdown=1325845319
>> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: te_update_diff:150 - Triggered transition abort (complete=1, tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient attribute: update
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New Transition Timer (I_PE_CALC) just popped!
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: Requesting the current CIB: S_POLICY_ENGINE
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, quorate=0
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On loss of CCM Quorum: Ignore
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind faith: not fencing unseen nodes
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: determine_online_status: Node rh57-1 is shutting down
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpUltraMonkey
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmVIP       (ocf::heartbeat:LVM):   Stopped
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith1
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-2        (stonith:external/ssh): Stopped
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith1-3        (stonith:meatware):     Stopped
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  Resource Group: grpStonith2
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-2        (stonith:external/ssh): Started rh57-1
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:      prmStonith2-3        (stonith:meatware):     Started rh57-1
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  Clone Set: clnPingd
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      Started: [ rh57-1 ]
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: clnPingd: Rolling back scores from prmVIP
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmPingd:0 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmVIP cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith1-2: Rolling back scores from prmStonith1-3
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-2 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith1-3 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: prmStonith2-2: Rolling back scores from prmStonith2-3
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-2 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource prmStonith2-3 cannot run anywhere
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node rh57-1 for shutdown
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmVIP     (Stopped)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-2      (Stopped)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave   resource prmStonith1-3      (Stopped)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-2      (rh57-1)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmStonith2-3      (rh57-1)
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop    resource prmPingd:0 (rh57-1)
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked transition 4: 9 actions in 9 synapses
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from /var/lib/pengine/pe-input-4.bz2
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 19 fired and confirmed
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 24 fired and confirmed
>> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[10] on prmPingd:0 for client 3461, its parameters: CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] name=[default_ping_set] CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] host_list=[192.168.40.1] CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] CRM_meta_notify=[false]  cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 )
>> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] (pid 3612)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on prmPingd:0 for client 3461: pid 3612 exited with return code 0
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, confirmed=true) Cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, confirmed=true) ok
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 25 fired and confirmed
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 4 fired and confirmed
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[13] on prmStonith2-3 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-3_stop_0 )
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] (pid 3617)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH resource <rsc_id=prmStonith2-3> : Device=meatware
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_monitor_3600000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, confirmed=true) ok
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation monitor[11] on prmStonith2-2 for client 3461, its parameters: CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  cancelled
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmStonith2-2_stop_0 )
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] (pid 3619)
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH resource <rsc_id=prmStonith2-2> : Device=external/ssh
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_monitor_3600000 (call=11, status=1, cib-update=0, confirmed=true) Cancelled
>> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, confirmed=true) ok
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo action 20 fired and confirmed
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing crm-event (28): do_shutdown on rh57-1
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event (28) is a local shutdown
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: ====================================================
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: Transition 4 is now complete
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role released
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Transitioner is now inactive
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: Disconnecting STONITH...
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected.
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating the pengine
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM to pengine: [3464]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for subsystems to exit
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) before being called (GSource: 0x179d9b0)
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at 429442052 should have started at 429442010
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: Process pengine:[3464] exited (signal=0, exitcode=0)
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 80 ms (> 30 ms) (GSource: 0x179d9b0)
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received HUP from pengine:[3464]
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: Connection to the Policy Engine released
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All subsystems stopped, continuing
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource prmVIP was active at shutdown.  You may ignore this error if it is unmanaged.
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: Disconnected from the LRM
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: Disconnected from Heartbeat
>> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed from ccm
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: Disconnecting CIB
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmd_cib_connection_destroy: Connection to the CIB terminated...
>> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We are now in R/O mode
>> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
>> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC Channel to 3461 is not connected
>> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0)
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_cib_connection_destroy: Connection to the CIB terminated...
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: /usr/lib64/heartbeat/stonithd normally quit.
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/cib process group 3457 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected 0 clients
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All clients disconnected...
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: initiate_exit: Disconnecting heartbeat
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting...
>> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed from ccm
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 ms) (GSource: 0x7b28140)
>> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to shut down
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO process 3446 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3447 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3448 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE process 3449 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD process 3450 with signal 15
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 exited. 5 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 exited. 4 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 exited. 3 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 exited. 2 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 exited. 1 remaining
>> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat shutdown complete.
>> >> > >
>> >> > > -----------------------------
>> >> > >
>> >> > >
>> >> > >
>> >> > > Best Regards,
>> >> > > Hideo Yamauchi.
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <andrew at beekhof.net> wrote:
>> >> > >
>> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661014 at ybb.ne.jp> wrote:
>> >> > >> > Hi All,
>> >> > >> >
>> >> > >> > When Pacemaker stops when there is the resource that failed in probe processing, crmd outputs the following error message.
>> >> > >> >
>> >> > >> >
>> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX was active at shutdown.  You may ignore this error if it is unmanaged.
>> >> > >> >
>> >> > >> >
>> >> > >> > Because the resource that failed in probe processing does not start,
>> >> > >>
>> >> > >> But it should have a subsequent stop action which would set it back to
>> >> > >> being inactive.
>> >> > >> Did that not happen in this case?
>> >> > >>
>> >> > >> > this error message is not right.
>> >> > >> >
>> >> > >> > I think that the following correction may be good, but we do not have conviction.
>> >> > >> >
>> >> > >> >
>> >> > >> >  * crmd/lrm.c
>> >> > >> >  (snip)
>> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>> >> > >> >                        active = FALSE;
>> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval == 0
>> >> > >> > +                                && safe_str_eq(op->op_type, CRMD_ACTION_STATUS)) {
>> >> > >> > +                        active = FALSE;
>> >> > >> >                } else {
>> >> > >> >                        active = TRUE;
>> >> > >> >                }
>> >> > >> >  (snip)
>> >> > >> >
>> >> > >> >
>> >> > >> > In the source for development of Pacemaker, handling of this processing seems to be considerably changed.
>> >> > >> > It requests backporting to Pacemaker1.0 system of this change that we can do it.
>> >> > >> >
>> >> > >> > Best Regards,
>> >> > >> > Hideo Yamauchi.
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > _______________________________________________
>> >> > >> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> > >> >
>> >> > >> > Project Home: http://www.clusterlabs.org
>> >> > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > >> > Bugs: http://bugs.clusterlabs.org
>> >> > >>
>> >> > >
>> >> > > _______________________________________________
>> >> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >> > >
>> >> > > Project Home: http://www.clusterlabs.org
>> >> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> > > Bugs: http://bugs.clusterlabs.org
>> >> >
>> >>
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
    
    
More information about the Pacemaker
mailing list