[Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

Brian J. Murrell (brian) brian at interlinx.bc.ca
Fri Dec 6 21:46:08 UTC 2013


I seem to have another instance where pacemaker fails to exit at the end
of a shutdown.  Here's the log from the start of the "service pacemaker
stop":

Dec  3 13:00:39 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Dec  3 13:00:39 wtm-60vm8 crmd[14076]:     info: do_te_invoke: Processing graph 19 (ref=pe_calc-dc-1386093636-83) derived from /var/lib/pengine/pe-input-40.bz2
Dec  3 13:00:39 wtm-60vm8 crmd[14076]:   notice: run_graph: ==== Transition 19 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-40.bz2): Complete
Dec  3 13:00:39 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec  3 13:00:39 wtm-60vm8 pengine[14075]:   notice: process_pe_message: Transition 19: PEngine Input stored in: /var/lib/pengine/pe-input-40.bz2
Dec  3 13:02:39 wtm-60vm8 crmd[14076]:     info: handle_shutdown_request: Creating shutdown request for wtm-60vm7 (state=S_IDLE)
Dec  3 13:02:39 wtm-60vm8 crmd[14076]:     info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-wtm-60vm7-shutdown, name=shutdown, value=1386093759, magic=NA, cib=0.48.3) : Transient attribute: update
Dec  3 13:02:39 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Dec  3 13:02:39 wtm-60vm8 pengine[14075]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Dec  3 13:02:39 wtm-60vm8 pengine[14075]:   notice: stage6: Scheduling Node wtm-60vm7 for shutdown
Dec  3 13:02:39 wtm-60vm8 pengine[14075]:   notice: LogActions: Move    rsrc1#011(Started wtm-60vm7 -> wtm-60vm8)
Dec  3 13:02:39 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Dec  3 13:02:39 wtm-60vm8 crmd[14076]:     info: do_te_invoke: Processing graph 20 (ref=pe_calc-dc-1386093759-84) derived from /var/lib/pengine/pe-input-41.bz2
Dec  3 13:02:39 wtm-60vm8 crmd[14076]:     info: te_rsc_command: Initiating action 11: stop rsrc1_stop_0 on wtm-60vm7
Dec  3 13:02:39 wtm-60vm8 pengine[14075]:   notice: process_pe_message: Transition 20: PEngine Input stored in: /var/lib/pengine/pe-input-41.bz2
Dec  3 13:02:44 wtm-60vm8 crmd[14076]:     info: te_rsc_command: Initiating action 12: start rsrc1_start_0 on wtm-60vm8 (local)
Dec  3 13:02:44 wtm-60vm8 lrmd: [14073]: info: rsc:rsrc1:14: start
Dec  3 13:02:44 wtm-60vm8 crmd[14076]:     info: te_crm_command: Executing crm-event (16): do_shutdown on wtm-60vm7
Dec  3 13:02:44 wtm-60vm8 crmd[14076]:   notice: crmd_peer_update: Status update: Client wtm-60vm7/crmd now has status [offline] (DC=true)
Dec  3 13:02:44 wtm-60vm8 cib[14071]:     info: cib_process_shutdown_req: Shutdown REQ from wtm-60vm7
Dec  3 13:02:44 wtm-60vm8 cib[14071]:     info: cib_process_request: Operation complete: op cib_shutdown_req for section 'all' (origin=wtm-60vm7/wtm-60vm7/(null), version=0.48.5): ok (rc=0)
Dec  3 13:02:51 wtm-60vm8 corosync[14032]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec  3 13:02:51 wtm-60vm8 cib[14071]:   notice: ais_dispatch_message: Membership 20: quorum lost
Dec  3 13:02:51 wtm-60vm8 cib[14071]:     info: crm_update_peer: Node wtm-60vm7: id=3866040492 state=lost (new) addr=r(0) ip(172.24.111.230) r(1) ip(10.0.0.102)  votes=1 born=16 seen=16 proc=00000000000000000000000000000002
Dec  3 13:02:51 wtm-60vm8 crmd[14076]:   notice: ais_dispatch_message: Membership 20: quorum lost
Dec  3 13:02:51 wtm-60vm8 crmd[14076]:     info: ais_status_callback: status: wtm-60vm7 is now lost (was member)
Dec  3 13:02:51 wtm-60vm8 crmd[14076]:     info: crm_update_peer: Node wtm-60vm7: id=3866040492 state=lost (new) addr=r(0) ip(172.24.111.230) r(1) ip(10.0.0.102)  votes=1 born=16 seen=16 proc=00000000000000000000000000000002
Dec  3 13:02:51 wtm-60vm8 cib[14071]:     info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/180, version=0.48.6): ok (rc=0)
Dec  3 13:02:51 wtm-60vm8 cib[14071]:     info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/182, version=0.48.8): ok (rc=0)
Dec  3 13:02:51 wtm-60vm8 crmd[14076]:     info: crmd_ais_dispatch: Setting expected votes to 2
Dec  3 13:02:51 wtm-60vm8 cib[14071]:     info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/184, version=0.48.9): ok (rc=0)
Dec  3 13:02:52 wtm-60vm8 pacemakerd[14067]:     info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Dec  3 13:02:52 wtm-60vm8 pacemakerd[14067]:   notice: pcmk_shutdown_worker: Shuting down Pacemaker
Dec  3 13:02:52 wtm-60vm8 pacemakerd[14067]:   notice: stop_child: Stopping crmd: Sent -15 to process 14076
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:     info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:   notice: crm_shutdown: Requesting shutdown, upper limit is 1200000ms
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:  warning: do_log: FSA: Input I_SHUTDOWN from crm_shutdown() received in state S_TRANSITION_ENGINE
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:     info: do_shutdown_req: Sending shutdown request to wtm-60vm8
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:     info: handle_shutdown_request: Creating shutdown request for wtm-60vm8 (state=S_POLICY_ENGINE)
Dec  3 13:02:52 wtm-60vm8 attrd[14074]:   notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1386093772)
Dec  3 13:02:52 wtm-60vm8 attrd[14074]:   notice: attrd_perform_update: Sent update 45: shutdown=1386093772
Dec  3 13:02:52 wtm-60vm8 crmd[14076]:     info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-wtm-60vm8-shutdown, name=shutdown, value=1386093772, magic=NA, cib=0.48.11) : Transient attribute: update
Dec  3 13:02:56 wtm-60vm8 cib[14071]:     info: cib_stats: Processed 654 operations (1146.00us average, 0% utilization) in the last 10min
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:     info: process_lrm_event: LRM operation rsrc1_start_0 (call=14, rc=0, cib-update=186, confirmed=true) ok
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:   notice: run_graph: ==== Transition 20 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-41.bz2): Stopped
Dec  3 13:03:20 wtm-60vm8 pengine[14075]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Dec  3 13:03:20 wtm-60vm8 pengine[14075]:   notice: stage6: Scheduling Node wtm-60vm8 for shutdown
Dec  3 13:03:20 wtm-60vm8 pengine[14075]:   notice: LogActions: Stop    st-fencing#011(wtm-60vm8)
Dec  3 13:03:20 wtm-60vm8 pengine[14075]:   notice: LogActions: Stop    rsrc2#011(wtm-60vm8)
Dec  3 13:03:20 wtm-60vm8 pengine[14075]:   notice: LogActions: Stop    rsrc1#011(wtm-60vm8)
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:     info: do_te_invoke: Processing graph 21 (ref=pe_calc-dc-1386093800-90) derived from /var/lib/pengine/pe-input-42.bz2
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:     info: te_rsc_command: Initiating action 6: stop rsrc2_stop_0 on wtm-60vm8 (local)
Dec  3 13:03:20 wtm-60vm8 lrmd: [14073]: info: cancel_op: operation monitor[13] on ocf::Target::rsrc2 for client 14076, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.6] target=[a70b96ee-10eb-41df-80a9-cdefdab77058] CRM_meta_timeout=[60000] CRM_meta_interval=[5000]  cancelled
Dec  3 13:03:20 wtm-60vm8 lrmd: [14073]: info: rsc:rsrc2:15: stop
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:     info: te_rsc_command: Initiating action 7: stop rsrc1_stop_0 on wtm-60vm8 (local)
Dec  3 13:03:20 wtm-60vm8 lrmd: [14073]: info: rsc:rsrc1:16: stop
Dec  3 13:03:20 wtm-60vm8 crmd[14076]:     info: process_lrm_event: LRM operation rsrc2_monitor_5000 (call=13, status=1, cib-update=0, confirmed=true) Cancelled
Dec  3 13:03:20 wtm-60vm8 pengine[14075]:   notice: process_pe_message: Transition 21: PEngine Input stored in: /var/lib/pengine/pe-input-42.bz2
Dec  3 13:03:26 wtm-60vm8 crmd[14076]:     info: process_lrm_event: LRM operation rsrc1_stop_0 (call=16, rc=0, cib-update=188, confirmed=true) ok
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: process_lrm_event: LRM operation rsrc2_stop_0 (call=15, rc=0, cib-update=189, confirmed=true) ok
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: te_rsc_command: Initiating action 5: stop st-fencing_stop_0 on wtm-60vm8 (local)
Dec  3 13:03:38 wtm-60vm8 lrmd: [14073]: info: rsc:st-fencing:17: stop
Dec  3 13:03:38 wtm-60vm8 stonith-ng[14072]:     info: stonith_device_remove: Removed 'st-fencing' from the device list (0 active devices)
Dec  3 13:03:38 wtm-60vm8 stonith-ng[14072]:     info: stonith_command: Processed st_device_remove from lrmd: rc=0
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: process_lrm_event: LRM operation st-fencing_stop_0 (call=17, rc=0, cib-update=190, confirmed=true) ok
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: te_crm_command: Executing crm-event (10): do_shutdown on wtm-60vm8
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: te_crm_command: crm-event (10) is a local shutdown
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:   notice: run_graph: ==== Transition 21 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-42.bz2): Complete
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_dc_release: DC role released
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: pe_connection_destroy: Connection to the Policy Engine released
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_te_control: Transitioner is now inactive
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_shutdown: Disconnecting STONITH...
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: tengine_stonith_connection_destroy: Fencing daemon disconnected
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_lrm_control: Disconnected from the LRM
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:   notice: terminate_ais_connection: Disconnecting from AIS
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_ha_control: Disconnected from OpenAIS
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_cib_control: Disconnecting CIB
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: crmd_cib_connection_destroy: Connection to the CIB terminated...
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: free_mem: Dropping I_RELEASE_SUCCESS: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_dc_release ]
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: free_mem: Dropping I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ]
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: crm_xml_cleanup: Cleaning up memory from libxml2
Dec  3 13:03:38 wtm-60vm8 crmd[14076]:     info: do_exit: [crmd] stopped (0)
Dec  3 13:03:38 wtm-60vm8 cib[14071]:     info: cib_process_readwrite: We are now in R/O mode
Dec  3 13:03:38 wtm-60vm8 cib[14071]:  warning: send_ipc_message: IPC Channel to 14076 is not connected
Dec  3 13:03:38 wtm-60vm8 cib[14071]:  warning: send_via_callback_channel: Delivery of reply to client 14076/6481ffbc-6863-4151-a0e4-9dec1ba2e2af failed
Dec  3 13:03:38 wtm-60vm8 cib[14071]:  warning: do_local_notify: A-Sync reply to crmd failed: reply failed
Dec  3 13:03:38 wtm-60vm8 pacemakerd[14067]:     info: pcmk_child_exit: Child process crmd exited (pid=14076, rc=0)
Dec  3 13:03:53 wtm-60vm8 pacemakerd[14067]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
Dec  3 13:03:53 wtm-60vm8 pacemakerd[14067]:   notice: stop_child: Stopping pengine: Sent -15 to process 14075
Dec  3 13:03:53 wtm-60vm8 pacemakerd[14067]:     info: pcmk_child_exit: Child process pengine exited (pid=14075, rc=0)
Dec  3 13:04:08 wtm-60vm8 pacemakerd[14067]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
Dec  3 13:04:08 wtm-60vm8 pacemakerd[14067]:   notice: stop_child: Stopping attrd: Sent -15 to process 14074
Dec  3 13:04:08 wtm-60vm8 attrd[14074]:   notice: main: Exiting...
Dec  3 13:04:08 wtm-60vm8 pacemakerd[14067]:     info: pcmk_child_exit: Child process attrd exited (pid=14074, rc=0)
Dec  3 13:04:08 wtm-60vm8 pacemakerd[14067]:  warning: send_ipc_message: IPC Channel to 14074 is not connected
Dec  3 13:04:23 wtm-60vm8 pacemakerd[14067]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
Dec  3 13:04:23 wtm-60vm8 pacemakerd[14067]:   notice: stop_child: Stopping lrmd: Sent -15 to process 14073
Dec  3 13:04:23 wtm-60vm8 lrmd: [14073]: info: lrmd is shutting down
Dec  3 13:04:23 wtm-60vm8 pacemakerd[14067]:     info: pcmk_child_exit: Child process lrmd exited (pid=14073, rc=0)
Dec  3 13:04:38 wtm-60vm8 pacemakerd[14067]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
Dec  3 13:04:38 wtm-60vm8 pacemakerd[14067]:   notice: stop_child: Stopping stonith-ng: Sent -15 to process 14072
Dec  3 13:04:38 wtm-60vm8 stonith-ng[14072]:     info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Dec  3 13:04:38 wtm-60vm8 stonith-ng[14072]:     info: stonith_shutdown: Terminating with  0 clients
Dec  3 13:04:38 wtm-60vm8 pacemakerd[14067]:     info: pcmk_child_exit: Child process stonith-ng exited (pid=14072, rc=0)
Dec  3 13:04:53 wtm-60vm8 pacemakerd[14067]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
Dec  3 13:04:53 wtm-60vm8 pacemakerd[14067]:   notice: stop_child: Stopping cib: Sent -15 to process 14071
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: cib_shutdown: Disconnected 0 clients
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: cib_process_disconnect: All clients disconnected...
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: cib_ha_connection_destroy: Heartbeat disconnection complete... exiting
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: cib_ha_connection_destroy: Exiting...
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: crm_xml_cleanup: Cleaning up memory from libxml2
Dec  3 13:04:53 wtm-60vm8 cib[14071]:     info: main: Done
Dec  3 13:04:53 wtm-60vm8 pacemakerd[14067]:     info: pcmk_child_exit: Child process cib exited (pid=14071, rc=0)
Dec  3 13:05:08 wtm-60vm8 pacemakerd[14067]:    error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
Dec  3 13:05:08 wtm-60vm8 pacemakerd[14067]:   notice: pcmk_shutdown_worker: Shutdown complete
Dec  3 13:05:08 wtm-60vm8 pacemakerd[14067]:     info: main: Exiting pacemakerd

These types of shutdown failure issues seem to always end up with the series of:

error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

Even though the above messages seem to indicate that pacemaker did
finally exit it did not as can be seen looking at the process table:

14032 ?        Ssl    0:01 corosync
14067 ?        S      0:00 pacemakerd
14071 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib

So what does this "sending message via cpg FAILED: (rc=6)" mean exactly?

Or any other ideas what happened to this shutdown to cause it to fail/hang ultimately?

Cheers,
b.








More information about the Pacemaker mailing list