[ClusterLabs] Q: What is lvmlockd locking?

Gang He ghe at suse.com
Thu Jan 21 05:30:28 EST 2021


Hi Ulrich,

The problem is reproduced stably?  could you help to share your 
pacemaker crm configure and OS/lvm2/resource-agents related version 
information?
I feel the problem was probably caused by lvmlock resource agent script, 
which did not handle this corner case correctly.

Thanks
Gang


On 2021/1/21 17:53, Ulrich Windl wrote:
> Hi!
> 
> I have a problem: For tests I had configured lvmlockd. Now that the tests have ended, no LVM is used for cluster resources any more, but lvmlockd is still configured.
> Unfortunately I ran into this problem:
> On OCFS2 mount was unmounted successfully, another holding the lockspace for lvmlockd is still active.
> lvmlockd shuts down. At least it says so.
> 
> Unfortunately that stop never succeeds (runs into a timeout).
> 
> My suspect is something like this:
> Some non-LVM lock exists for the now unmounted OCFS2 filesystem.
> lvmlockd want to access that filesystem for unknown reasons.
> 
> I don't understand waht's going on.
> 
> The events at nod shutdown were:
> Some Xen PVM was live-migrated successfully to another node, but during that there was a message like this:
> Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
> Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
> Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not locked
> Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
> Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
> Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not locked
> Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test-jeos4
> Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test-jeos4)[32786]: INFO: test-jeos4: live migration to h18 succeeded.
> 
> Unfortnuately the log message makes it practically impossible to guess what the locked object actually is (indirect lock using SHA256 as hash it seems).
> 
> Then the OCFS for the VM images unmounts successfully while the stop of lvmlockd is still busy:
> Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the lockspaces of shared VG(s)...
> ...
> Jan 21 10:21:56 h19 pacemaker-controld[42493]:  error: Result of stop operation for prm_lvmlockd on h19: Timed Out
> 
> As said before: I don't have shared VGs any more. I don't understand.
> 
> On a node without VMs running I see:
> h19:~ # lvmlockctl -d
> 1611221190 lvmlockd started
> 1611221190 No lockspaces found to adopt
> 1611222560 new cl 1 pi 2 fd 8
> 1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0
> 1611222560 send client[10817] cl 1 dump result 0 dump_len 149
> 1611222560 send_dump_buf delay 0 total 149
> 1611222560 close client[10817] cl 1 fd 8
> 1611222563 new cl 2 pi 2 fd 8
> 1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0
> 
> On a node with VMs running I see:
> h16:~ # lvmlockctl -d
> 1611216942 lvmlockd started
> 1611216942 No lockspaces found to adopt
> 1611221684 new cl 1 pi 2 fd 8
> 1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0
> 1611221684 lockspace "lvm_global" not found for dlm gl, adding...
> 1611221684 add_lockspace_thread dlm lvm_global version 0
> 1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0
> 1611221685 S lvm_global lm_add_lockspace done 0
> 1611221685 S lvm_global R GLLK action lock sh
> 1611221685 S lvm_global R GLLK res_lock cl 1 mode sh
> 1611221685 S lvm_global R GLLK lock_dlm
> 1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0
> 1611221685 S lvm_global R GLLK res_lock all versions zero
> 1611221685 S lvm_global R GLLK res_lock invalidate global state
> 1611221685 send pvs[17159] cl 1 lock gl rv 0
> 1611221685 recv pvs[17159] cl 1 lock vg "sys" mode sh flags 0
> 1611221685 lockspace "lvm_sys" not found
> 1611221685 send pvs[17159] cl 1 lock vg rv -210 ENOLS
> 1611221685 close pvs[17159] cl 1 fd 8
> 1611221685 S lvm_global R GLLK res_unlock cl 1 from close
> 1611221685 S lvm_global R GLLK unlock_dlm
> 1611221685 S lvm_global R GLLK res_unlock lm done
> 1611222582 new cl 2 pi 2 fd 8
> 1611222582 recv client[19210] cl 2 dump_log . "" mode iv flags 0
> 
> Note: "lvm_sys" may refer to VG sys used for the hypervisor.
> 
> Regards,
> Ulrich
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list