[ClusterLabs] Antw: [EXT] Re: Q: What is lvmlockd locking?

Thu Jan 21 07:08:11 EST 2021

>>> Gang He <ghe at suse.com> schrieb am 21.01.2021 um 11:30 in Nachricht
<59b543ee-0824-6b91-d0af-48f66922bc89 at suse.com>:
> Hi Ulrich,
> 
> The problem is reproduced stably?  could you help to share your 
> pacemaker crm configure and OS/lvm2/resource‑agents related version 
> information?

OK, the problem occurred on every node, so I guess it's reproducible.
OS is SLES15 SP2 with all current updates (lvm2-2.03.05-8.18.1.x86_64,
pacemaker-2.0.4+20200616.2deceaa3a-3.3.1.x86_64,
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64).

The configuration (somewhat trimmed) is attached.

The only VG the cluster node sees is:
ph16:~ # vgs
  VG  #PV #LV #SN Attr   VSize   VFree
  sys   1   3   0 wz--n- 222.50g    0

Regards,
Ulrich

> I feel the problem was probably caused by lvmlock resource agent script, 
> which did not handle this corner case correctly.
> 
> Thanks
> Gang
> 
> 
> On 2021/1/21 17:53, Ulrich Windl wrote:
>> Hi!
>> 
>> I have a problem: For tests I had configured lvmlockd. Now that the tests 
> have ended, no LVM is used for cluster resources any more, but lvmlockd is 
> still configured.
>> Unfortunately I ran into this problem:
>> On OCFS2 mount was unmounted successfully, another holding the lockspace
for 
> lvmlockd is still active.
>> lvmlockd shuts down. At least it says so.
>> 
>> Unfortunately that stop never succeeds (runs into a timeout).
>> 
>> My suspect is something like this:
>> Some non‑LVM lock exists for the now unmounted OCFS2 filesystem.
>> lvmlockd want to access that filesystem for unknown reasons.
>> 
>> I don't understand waht's going on.
>> 
>> The events at nod shutdown were:
>> Some Xen PVM was live‑migrated successfully to another node, but during
that 
> there was a message like this:
>> Jan 21 10:20:13 h19 virtlockd[41990]: libvirt version: 6.0.0
>> Jan 21 10:20:13 h19 virtlockd[41990]: hostname: h19
>> Jan 21 10:20:13 h19 virtlockd[41990]: resource busy: Lockspace resource 
> '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not 
> locked
>> Jan 21 10:20:13 h19 libvirtd[41991]: libvirt version: 6.0.0
>> Jan 21 10:20:13 h19 libvirtd[41991]: hostname: h19
>> Jan 21 10:20:13 h19 libvirtd[41991]: resource busy: Lockspace resource 
> '4c6bebd1f4bc581255b422a65d317f31deef91f777e51ba0daf04419dda7ade5' is not 
> locked
>> Jan 21 10:20:13 h19 libvirtd[41991]: Unable to release lease on test‑jeos4
>> Jan 21 10:20:13 h19 VirtualDomain(prm_xen_test‑jeos4)[32786]: INFO: 
> test‑jeos4: live migration to h18 succeeded.
>> 
>> Unfortnuately the log message makes it practically impossible to guess what

> the locked object actually is (indirect lock using SHA256 as hash it
seems).
>> 
>> Then the OCFS for the VM images unmounts successfully while the stop of 
> lvmlockd is still busy:
>> Jan 21 10:20:16 h19 lvmlockd(prm_lvmlockd)[32945]: INFO: stop the
lockspaces 
> of shared VG(s)...
>> ...
>> Jan 21 10:21:56 h19 pacemaker‑controld[42493]:  error: Result of stop 
> operation for prm_lvmlockd on h19: Timed Out
>> 
>> As said before: I don't have shared VGs any more. I don't understand.
>> 
>> On a node without VMs running I see:
>> h19:~ # lvmlockctl ‑d
>> 1611221190 lvmlockd started
>> 1611221190 No lockspaces found to adopt
>> 1611222560 new cl 1 pi 2 fd 8
>> 1611222560 recv client[10817] cl 1 dump_info . "" mode iv flags 0
>> 1611222560 send client[10817] cl 1 dump result 0 dump_len 149
>> 1611222560 send_dump_buf delay 0 total 149
>> 1611222560 close client[10817] cl 1 fd 8
>> 1611222563 new cl 2 pi 2 fd 8
>> 1611222563 recv client[10818] cl 2 dump_log . "" mode iv flags 0
>> 
>> On a node with VMs running I see:
>> h16:~ # lvmlockctl ‑d
>> 1611216942 lvmlockd started
>> 1611216942 No lockspaces found to adopt
>> 1611221684 new cl 1 pi 2 fd 8
>> 1611221684 recv pvs[17159] cl 1 lock gl "" mode sh flags 0
>> 1611221684 lockspace "lvm_global" not found for dlm gl, adding...
>> 1611221684 add_lockspace_thread dlm lvm_global version 0
>> 1611221684 S lvm_global lm_add_lockspace dlm wait 0 adopt 0
>> 1611221685 S lvm_global lm_add_lockspace done 0
>> 1611221685 S lvm_global R GLLK action lock sh
>> 1611221685 S lvm_global R GLLK res_lock cl 1 mode sh
>> 1611221685 S lvm_global R GLLK lock_dlm
>> 1611221685 S lvm_global R GLLK res_lock rv 0 read vb 0 0 0
>> 1611221685 S lvm_global R GLLK res_lock all versions zero
>> 1611221685 S lvm_global R GLLK res_lock invalidate global state
>> 1611221685 send pvs[17159] cl 1 lock gl rv 0
>> 1611221685 recv pvs[17159] cl 1 lock vg "sys" mode sh flags 0
>> 1611221685 lockspace "lvm_sys" not found
>> 1611221685 send pvs[17159] cl 1 lock vg rv ‑210 ENOLS
>> 1611221685 close pvs[17159] cl 1 fd 8
>> 1611221685 S lvm_global R GLLK res_unlock cl 1 from close
>> 1611221685 S lvm_global R GLLK unlock_dlm
>> 1611221685 S lvm_global R GLLK res_unlock lm done
>> 1611222582 new cl 2 pi 2 fd 8
>> 1611222582 recv client[19210] cl 2 dump_log . "" mode iv flags 0
>> 
>> Note: "lvm_sys" may refer to VG sys used for the hypervisor.
>> 
>> Regards,
>> Ulrich
>> 
>> 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config
Type: application/octet-stream
Size: 6601 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210121/daff6246/attachment.obj>