[ClusterLabs] Q: What is lvmlockd locking?

Thu Jan 21 04:53:42 EST 2021

Hi!

I have a problem: For tests I had configured lvmlockd. Now that the tests have ended, no LVM is used for cluster resources any more, but lvmlockd is still configured.
Unfortunately I ran into this problem:
On OCFS2 mount was unmounted successfully, another holding the lockspace for lvmlockd is still active.
lvmlockd shuts down. At least it says so.

Unfortunately that stop never succeeds (runs into a timeout).

My suspect is something like this:
Some non-LVM lock exists for the now unmounted OCFS2 filesystem.
lvmlockd want to access that filesystem for unknown reasons.

I don't understand waht's going on.

The events at nod shutdown were:
Some Xen PVM was live-migrated successfully to another node, but during that there was a message like this:
Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not locked
Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not locked
Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test-jeos4
Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test-jeos4)[32786]: INFO: test-jeos4: live migration to h18 succeeded.

Unfortnuately the log message makes it practically impossible to guess what the locked object actually is (indirect lock using SHA256 as hash it seems).

Then the OCFS for the VM images unmounts successfully while the stop of lvmlockd is still busy:
Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the lockspaces of shared VG(s)...
...
Jan 21 10:21:56 h19 pacemaker-controld[42493]:  error: Result of stop operation for prm_lvmlockd on h19: Timed Out

As said before: I don't have shared VGs any more. I don't understand.

On a node without VMs running I see:
h19:~ # lvmlockctl -d
1611221190 lvmlockd started
1611221190 No lockspaces found to adopt
1611222560 new cl 1 pi 2 fd 8
1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0
1611222560 send client[10817] cl 1 dump result 0 dump_len 149
1611222560 send_dump_buf delay 0 total 149
1611222560 close client[10817] cl 1 fd 8
1611222563 new cl 2 pi 2 fd 8
1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0

On a node with VMs running I see:
h16:~ # lvmlockctl -d
1611216942 lvmlockd started
1611216942 No lockspaces found to adopt
1611221684 new cl 1 pi 2 fd 8
1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0
1611221684 lockspace "lvm_global" not found for dlm gl, adding...
1611221684 add_lockspace_thread dlm lvm_global version 0
1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0
1611221685 S lvm_global lm_add_lockspace done 0
1611221685 S lvm_global R GLLK action lock sh
1611221685 S lvm_global R GLLK res_lock cl 1 mode sh
1611221685 S lvm_global R GLLK lock_dlm
1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0
1611221685 S lvm_global R GLLK res_lock all versions zero
1611221685 S lvm_global R GLLK res_lock invalidate global state
1611221685 send pvs[17159] cl 1 lock gl rv 0
1611221685 recv pvs[17159] cl 1 lock vg "sys" mode sh flags 0
1611221685 lockspace "lvm_sys" not found
1611221685 send pvs[17159] cl 1 lock vg rv -210 ENOLS
1611221685 close pvs[17159] cl 1 fd 8
1611221685 S lvm_global R GLLK res_unlock cl 1 from close
1611221685 S lvm_global R GLLK unlock_dlm
1611221685 S lvm_global R GLLK res_unlock lm done
1611222582 new cl 2 pi 2 fd 8
1611222582 recv client[19210] cl 2 dump_log . "" mode iv flags 0

Note: "lvm_sys" may refer to VG sys used for the hypervisor.

Regards,
Ulrich