[Pacemaker] node1 fencing itself after node2 being fenced
Asgaroth
lists at blueface.com
Mon Feb 10 11:46:08 UTC 2014
Hi All,
OK, here is my testing using cman/clvmd enabled on system startup and clvmd
outside of pacemaker control. I still seem to be getting the clvmd hang/fail
situation even when running outside of pacemaker control, I cannot see
off-hand where the issue is occurring, but maybe it is related to what
Vladislav was saying where clvmd hangs if it is not running on a cluster
node that has cman running, however, I have both cman/clvmd enable to start
at boot. Here is a little synopsis of what appears to be happening here:
[1] Everything is fine here, both nodes up and running:
# cman_tool nodes
Node Sts Inc Joined Name
1 M 444 2014-02-07 10:25:00 test01
2 M 440 2014-02-07 10:25:00 test02
# dlm_tool ls
dlm lockspaces
name clvmd
id 0x4104eefa
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 1,1
members 1 2
[2] Here I "echo c > /proc/sysrq-trigger" on node2 (test02), I can see
crm_mon saying that node 2 is in unclean state and fencing kicks in (reboot
node 2)
# cman_tool nodes
Node Sts Inc Joined Name
1 M 440 2014-02-07 10:27:58 test01
2 X 444 test02
# dlm_tool ls
dlm lockspaces
name clvmd
id 0x4104eefa
flags 0x00000004 kern_stop
change member 2 joined 1 remove 0 failed 0 seq 2,2
members 1 2
new change member 1 joined 0 remove 1 failed 1 seq 3,3
new status wait_messages 0 wait_condition 1 fencing
new members 1
[3] So the above looks fine so far, to my untrained eye, dlm in kern_stop
state while waiting on successful fence, and the node reboots and we have
the following state:
# cman_tool nodes
Node Sts Inc Joined Name
1 M 440 2014-02-07 10:27:58 test01
2 M 456 2014-02-07 10:35:42 test02
# dlm_tool ls
dlm lockspaces
name clvmd
id 0x4104eefa
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 4,4
members 1 2
So it looks like dlm and cman seem to be working properly (again, I could be
wrong, my untrained eye and all :) )
However, if I try to run any lvm status/clvm status commands then they still
just hang. Could this be related to clvmd doing a check when cman is up and
running but clvmd has not started yet (As I understand from Vladislav's
previous email). Or do I have something fundamentally wrong with my fencing
configuration.
Here is a link to the "dlm_tool dump" at the time of the above "dlm_tool ls"
(if it helps)
http://pastebin.com/KV6YZWrN
Again, thanks for all the info thus far.
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140210/ce354580/attachment.htm>
More information about the Pacemaker
mailing list