[Pacemaker] clmvd hangs on node1 if node2 is fenced

Michael Smith msmith at cbnco.com
Thu Aug 26 18:50:25 EDT 2010


 > Xinwei Hu <hxinwei at ...> writes:
 >
 > > That sounds worrying actually.
 > > I think this is logged as bug 585419 on SLES' bugzilla.
 > > If you can reproduce this issue, it worths to reopen it I think.

I've got a pair of fully patched SLES11 SP1 nodes and they're showing 
what I guess is the same behaviour: if I hard-poweroff node2, operations 
like "vgdisplay -v" hang on node1 for quite some time. Sometimes a 
minute, sometimes two, sometimes forever. They get stuck here:

Aug 26 18:31:42 xen-test1 clvmd[8906]: doing PRE command LOCK_VG 
'V_vm_store' at
1 (client=0x7f2714000b40)
Aug 26 18:31:42 xen-test1 clvmd[8906]: lock_resource 'V_vm_store', 
flags=0, mode=3


After a few seconds, corosync & dlm notice the node is gone, but 
vg_display and
friends still hang while trying to lock the VG.

Aug 26 18:31:44 xen-test1 corosync[8476]:  [TOTEM ] A processor failed, 
forming new configuration.
Aug 26 18:31:50 xen-test1 cluster-dlm[8870]: update_cluster: Processing
membership 1260
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: dlm_process_node: Skipped 
active node 219878572: born-on=1256, last-seen=1260, this-event=1260, 
last-event=1256
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: del_configfs_node: 
del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/236655788"
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: dlm_process_node: Removed 
inactive node 236655788: born-on=1252, last-seen=1256, this-event=1260, 
last-event=1256
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: log_config: dlm:controld 
conf 1 0 1 memb 219878572 join left 236655788
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: log_config: dlm:ls:clvmd 
conf 1 0 1 memb 219878572 join left 236655788
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: add_change: clvmd 
add_change cg 3 remove nodeid 236655788 reason 3
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: add_change: clvmd 
add_change cg 3 counts member 1 joined 0 remove 1 failed 1
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: stop_kernel: clvmd 
stop_kernel cg 3
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: do_sysfs: write "0" to
"/sys/kernel/dlm/clvmd/control"
Aug 26 18:31:51 xen-test1 kernel: [  365.267802] dlm: closing connection 
to node 236655788
Aug 26 18:31:51 xen-test1 clvmd[8906]: confchg callback. 0 joined, 1 
left, 1 members
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: fence_node_time: Node
236655788/xen-test2 has not been shot yet
Aug 26 18:31:51 xen-test1 cluster-dlm[8870]: check_fencing_done: clvmd
check_fencing 23665578 not fenced add 1282861615 fence 0
Aug 26 18:31:51 xen-test1 crmd: [8489]: info: ais_dispatch: Membership 
1260: quorum still lost
Aug 26 18:31:51 xen-test1 cluster-dlm: [8870]: info: ais_dispatch: 
Membership 1260: quorum still lost
...

cluster-glue-1.0.5-0.5.1
corosync-1.2.1-0.5.1
kernel-xen-2.6.32.13-0.5.1
libcorosync4-1.2.1-0.5.1
lvm2-2.02.39-18.27.1
lvm2-clvm-2.02.39-18.27.1
multipath-tools-0.4.8-40.23.1

Thanks,
Mike





More information about the Pacemaker mailing list