[ClusterLabs] Sudden stop of pacemaker functions
Klechomir
klecho at gmail.com
Wed Feb 17 12:10:07 UTC 2016
Hi List,
Having strange issue lately.
I have two node cluster with some cloned resources on it.
One of my nodes suddenly starts reporting all its resources down (some
of them are actually running), stops logging and reminds in this this
state forever, while still responding to crm commands.
The curious thing is that restarting corosync/pacemaker doesn't change
anything.
Here are the last lines in the log after restart:
Feb 17 12:55:17 [609415] CLUSTER-1 crmd: notice: do_started:
The local CRM is operational
Feb 17 12:55:17 [609415] CLUSTER-1 crmd: info:
do_state_transition: State transition S_STARTING -> S_PENDING [
input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Feb 17 12:55:17 [609409] CLUSTER-1 cib: info:
cib_process_replace: Digest matched on replace from CLUSTER-2:
f7cb10ecaff6cfd1661ca7ec779192b3
Feb 17 12:55:17 [609409] CLUSTER-1 cib: info:
cib_process_replace: Replaced 0.238.1 with 0.238.40 from CLUSTER-2
Feb 17 12:55:17 [609409] CLUSTER-1 cib: info:
cib_replace_notify: Replaced: 0.238.1 -> 0.238.40 from CLUSTER-2
Feb 17 12:55:18 [609415] CLUSTER-1 crmd: info: update_dc:
Set DC to CLUSTER-2 (3.0.6)
Feb 17 12:55:19 [609411] CLUSTER-1 stonith-ng: info:
stonith_command: Processed register from crmd.609415: OK (0)
Feb 17 12:55:19 [609411] CLUSTER-1 stonith-ng: info:
stonith_command: Processed st_notify from crmd.609415: OK (0)
Feb 17 12:55:19 [609411] CLUSTER-1 stonith-ng: info:
stonith_command: Processed st_notify from crmd.609415: OK (0)
Feb 17 12:55:19 [609415] CLUSTER-1 crmd: info:
erase_status_tag: Deleting xpath:
//node_state[@uname='CLUSTER-1']/transient_attributes
Feb 17 12:55:19 [609415] CLUSTER-1 crmd: info: update_attrd:
Connecting to attrd... 5 retries remaining
Feb 17 12:55:19 [609415] CLUSTER-1 crmd: notice:
do_state_transition: State transition S_PENDING -> S_NOT_DC [
input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Feb 17 12:55:19 [609413] CLUSTER-1 attrd: notice:
attrd_local_callback: Sending full refresh (origin=crmd)
Feb 17 12:55:19 [609409] CLUSTER-1 cib: info:
cib_process_replace: Digest matched on replace from CLUSTER-2:
f7cb10ecaff6cfd1661ca7ec779192b3
Feb 17 12:55:19 [609409] CLUSTER-1 cib: info:
cib_process_replace: Replaced 0.238.40 with 0.238.40 from CLUSTER-2
Feb 17 12:55:21 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update shutdown=(null) failed: No such device or
address
Feb 17 12:55:22 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update terminate=(null) failed: No such device or
address
Feb 17 12:55:25 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update pingd=(null) failed: No such device or address
Feb 17 12:55:26 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update fail-count-p_Samba_Server=(null) failed:
No such device or address
Feb 17 12:55:26 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update master-p_Device_drbddrv1=(null) failed: No
such device or address
Feb 17 12:55:27 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update last-failure-p_Samba_Server=(null) failed:
No such device or address
Feb 17 12:55:27 [609413] CLUSTER-1 attrd: warning:
attrd_cib_callback: Update probe_complete=(null) failed: No such
device or address
After that the logging on the problematic node stops.
Corosync is v2.1.0.26; Pacemaker v1.1.8
Regards,
Klecho
More information about the Users
mailing list