[Pacemaker] help deciphering output

Alex Samad - Yieldbroker Alex.Samad at yieldbroker.com
Wed Oct 8 19:22:45 EDT 2014


One of my nodes died in a 2 node cluster

I gather something went wrong, and it fenced/killed itself. But I am not sure what happened.

I think maybe around that time the VM backups happened and snap of the VM could have happened

But there is nothing for me to put my finger on

Output from messages around that time

This is on devrp1
Oct  8 23:31:38 devrp1 corosync[1670]:   [TOTEM ] A processor failed, forming new configuration.
Oct  8 23:31:40 devrp1 corosync[1670]:   [CMAN  ] quorum lost, blocking activity
Oct  8 23:31:40 devrp1 corosync[1670]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Oct  8 23:31:40 devrp1 corosync[1670]:   [QUORUM] Members[1]: 1
Oct  8 23:31:40 devrp1 corosync[1670]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct  8 23:31:40 devrp1 corosync[1670]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:2 left:1)
Oct  8 23:31:40 devrp1 corosync[1670]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct  8 23:31:41 devrp1 kernel: dlm: closing connection to node 2
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: cman_event_callback: Membership 424: quorum lost
Oct  8 23:31:42 devrp1 corosync[1670]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct  8 23:31:42 devrp1 corosync[1670]:   [CMAN  ] quorum regained, resuming activity
Oct  8 23:31:42 devrp1 corosync[1670]:   [QUORUM] This node is within the primary component and will provide service.
Oct  8 23:31:42 devrp1 corosync[1670]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:42 devrp1 corosync[1670]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:42 devrp1 corosync[1670]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:1 left:0)
Oct  8 23:31:42 devrp1 corosync[1670]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now lost (was member)
Oct  8 23:31:42 devrp1 crmd[2350]:  warning: reap_dead_nodes: Our DC node (devrp2) left the cluster
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: cman_event_callback: Membership 428: quorum acquired
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now member (was lost)
Oct  8 23:31:42 devrp1 crmd[2350]:   notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ]
Oct  8 23:31:42 devrp1 corosync[1670]: cman killed by node 2 because we were killed by cman_tool or other application
Oct  8 23:31:42 devrp1 pacemakerd[2339]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Oct  8 23:31:42 devrp1 stonith-ng[2346]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Oct  8 23:31:42 devrp1 crmd[2350]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Oct  8 23:31:42 devrp1 crmd[2350]:    error: crmd_cs_destroy: connection terminated
Oct  8 23:31:43 devrp1 fenced[1726]: cluster is down, exiting
Oct  8 23:31:43 devrp1 fenced[1726]: daemon cpg_dispatch error 2
Oct  8 23:31:43 devrp1 attrd[2348]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Oct  8 23:31:43 devrp1 attrd[2348]:     crit: attrd_cs_destroy: Lost connection to Corosync service!
Oct  8 23:31:43 devrp1 attrd[2348]:   notice: main: Exiting...
Oct  8 23:31:43 devrp1 attrd[2348]:   notice: main: Disconnecting client 0x18cf240, pid=2350...
Oct  8 23:31:43 devrp1 pacemakerd[2339]:    error: mcp_cpg_destroy: Connection destroyed
Oct  8 23:31:43 devrp1 cib[2345]:    error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Oct  8 23:31:43 devrp1 cib[2345]:    error: cib_cs_destroy: Corosync connection lost!  Exiting.
Oct  8 23:31:43 devrp1 stonith-ng[2346]:    error: stonith_peer_cs_destroy: Corosync connection terminated
Oct  8 23:31:43 devrp1 dlm_controld[1752]: cluster is down, exiting
Oct  8 23:31:43 devrp1 dlm_controld[1752]: daemon cpg_dispatch error 2
Oct  8 23:31:43 devrp1 gfs_controld[1801]: cluster is down, exiting
Oct  8 23:31:43 devrp1 crmd[2350]:   notice: crmd_exit: Forcing immediate exit: Link has been severed (67)
Oct  8 23:31:43 devrp1 attrd[2348]:    error: attrd_cib_connection_destroy: Connection to the CIB terminated...
Oct  8 23:31:43 devrp1 lrmd[2347]:  warning: qb_ipcs_event_sendv: new_event_notification (2347-2350-6): Bad file descriptor (9)
Oct  8 23:31:43 devrp1 lrmd[2347]:  warning: send_client_notify: Notification of client crmd/94e94935-2221-434d-8a6f-90eba4ede55b failed
Oct  8 23:31:43 devrp1 lrmd[2347]:  warning: send_client_notify: Notification of client crmd/94e94935-2221-434d-8a6f-90eba4ede55b failed


Devrp2
Oct  8 23:31:26 devrp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Oct  8 23:31:42 devrp2 crmd[2361]:   notice: cman_event_callback: Membership 428: quorum lost
Oct  8 23:31:43 devrp2 fenced[1747]: telling cman to remove nodeid 1 from cluster
Oct  8 23:31:45 devrp2 corosync[1685]:   [TOTEM ] A processor failed, forming new configuration.
Oct  8 23:31:45 devrp2 corosync[1685]:   [CMAN  ] quorum lost, blocking activity
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] Members[1]: 2
Oct  8 23:31:45 devrp2 corosync[1685]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct  8 23:31:45 devrp2 corosync[1685]:   [CMAN  ] quorum regained, resuming activity
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] This node is within the primary component and will provide service.
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:45 devrp2 corosync[1685]:   [QUORUM] Members[2]: 1 2
Oct  8 23:31:45 devrp2 corosync[1685]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.214.52) ; members(old:2 left:1)
Oct  8 23:31:45 devrp2 corosync[1685]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct  8 23:31:45 devrp2 fenced[1747]: receive_start 1:3 add node with started_count 1
Oct  8 23:31:45 devrp2 crmd[2361]:   notice: cman_event_callback: Membership 428: quorum acquired
Oct  8 23:31:45 devrp2 crmd[2361]:   notice: crm_update_peer_state: cman_event_callback: Node devrp1[1] - state is now lost (was member)
Oct  8 23:31:45 devrp2 crmd[2361]:  warning: match_down_event: No match for shutdown action on devrp1
Oct  8 23:31:45 devrp2 crmd[2361]:   notice: peer_update_callback: Stonith/shutdown of devrp1 not matched
Oct  8 23:31:49 devrp2 crmd[2361]:   notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Oct  8 23:31:49 devrp2 crmd[2361]:  warning: match_down_event: No match for shutdown action on devrp1
Oct  8 23:31:49 devrp2 crmd[2361]:   notice: peer_update_callback: Stonith/shutdown of devrp1 not matched
Oct  8 23:31:49 devrp2 crmd[2361]:   notice: do_election_count_vote: Election 2 (current: 2, owner: devrp1): Processed vote from devrp1 (Peer is not part of our cluster)
Oct  8 23:31:49 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Oct  8 23:31:49 devrp2 kernel: dlm: closing connection to node 1
Oct  8 23:31:49 devrp2 attrd[2359]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Oct  8 23:31:49 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1412658023)
Oct  8 23:31:49 devrp2 dlm_controld[1773]: /sys/kernel/config/dlm/cluster/comms/1: rmdir failed: 2
Oct  8 23:31:49 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Oct  8 23:31:49 devrp2 pengine[2360]:   notice: LogActions: Start   ybrpip#011(devrp2)
Oct  8 23:31:49 devrp2 pengine[2360]:   notice: process_pe_message: Calculated Transition 1061: /var/lib/pacemaker/pengine/pe-input-2282.bz2
Oct  8 23:31:50 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Oct  8 23:31:50 devrp2 pengine[2360]:   notice: LogActions: Start   ybrpip#011(devrp2)
Oct  8 23:31:50 devrp2 pengine[2360]:   notice: process_pe_message: Calculated Transition 1062: /var/lib/pacemaker/pengine/pe-input-2283.bz2
Oct  8 23:31:50 devrp2 crmd[2361]:   notice: te_rsc_command: Initiating action 5: start ybrpip_start_0 on devrp2 (local)
Oct  8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: Adding inet address 10.172.214.50/24 with broadcast address 10.172.214.255 to device eth0
Oct  8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: Bringing device eth0 up
Oct  8 23:31:51 devrp2 IPaddr2(ybrpip)[30442]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.172.214.50 eth0 10.172.214.50 auto not_used not_used
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: process_lrm_event: LRM operation ybrpip_start_0 (call=995, rc=0, cib-update=1494, confirmed=true) ok
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: te_rsc_command: Initiating action 6: monitor ybrpip_monitor_5000 on devrp2 (local)
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: process_lrm_event: LRM operation ybrpip_monitor_5000 (call=998, rc=0, cib-update=1495, confirmed=false) ok
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: run_graph: Transition 1062 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2283.bz2): Complete
Oct  8 23:31:51 devrp2 crmd[2361]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct  8 23:31:52 devrp2 ntpd[2099]: Listen normally on 6 eth0 10.172.214.50 UDP 123
Oct  8 23:31:52 devrp2 ntpd[2099]: peers refreshed
Oct  8 23:31:52 devrp2 corosync[1685]:   [TOTEM ] A processor failed, forming new configuration.
Oct  8 23:31:54 devrp2 corosync[1685]:   [CMAN  ] quorum lost, blocking activity
Oct  8 23:31:54 devrp2 corosync[1685]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Oct  8 23:31:54 devrp2 corosync[1685]:   [QUORUM] Members[1]: 2
Oct  8 23:31:54 devrp2 corosync[1685]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct  8 23:31:54 devrp2 crmd[2361]:   notice: cman_event_callback: Membership 432: quorum lost
Oct  8 23:31:54 devrp2 corosync[1685]:   [CPG   ] chosen downlist: sender r(0) ip(10.172.214.52) ; members(old:2 left:1)
Oct  8 23:31:54 devrp2 corosync[1685]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct  8 23:31:54 devrp2 crmd[2361]:  warning: match_down_event: No match for shutdown action on devrp1
Oct  8 23:31:54 devrp2 crmd[2361]:   notice: peer_update_callback: Stonith/shutdown of devrp1 not matched
Oct  8 23:31:54 devrp2 crmd[2361]:   notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
Oct  8 23:31:54 devrp2 attrd[2359]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Oct  8 23:31:54 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ybrpstat (1412658023)
Oct  8 23:31:55 devrp2 attrd[2359]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Oct  8 23:31:55 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Oct  8 23:31:55 devrp2 pengine[2360]:   notice: process_pe_message: Calculated Transition 1063: /var/lib/pacemaker/pengine/pe-input-2284.bz2
Oct  8 23:31:55 devrp2 crmd[2361]:   notice: run_graph: Transition 1063 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2284.bz2): Complete
Oct  8 23:31:55 devrp2 crmd[2361]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct  8 23:31:59 devrp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Oct  8 23:36:52 devrp2 kernel: IN=eth0 OUT= MAC=00:50:56:a6:3a:5d:00:00:00:00:00:00:08:00 SRC=10.0.0.0 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 PROTO=2 
Oct  8 23:46:55 devrp2 crmd[2361]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
Oct  8 23:46:55 devrp2 pengine[2360]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Oct  8 23:46:55 devrp2 pengine[2360]:   notice: process_pe_message: Calculated Transition 1064: /var/lib/pacemaker/pengine/pe-input-2284.bz2
Oct  8 23:46:55 devrp2 crmd[2361]:   notice: run_graph: Transition 1064 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2284.bz2): Complete
Oct  8 23:46:55 devrp2 crmd[2361]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]




More information about the Pacemaker mailing list