[ClusterLabs] [EXT] Problem with DLM

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Jul 26 14:06:53 EDT 2022


Hi Bernd!

I think the answer may be some time before the timeout was reported; maybe a
network issue? Or a very high load. It's hard to say from the logs...


>>> Am 26.07.2022 um 15:32, in Nachricht <6ABA7762.4E4 : 205 : 62692>, "Lentes,
Bernd" <bernd.lentes at helmholtz-muenchen.de>
<bernd.lentes at helmholtz-muenchen.de> schrieb:
Hi,

it seems my DLM went grazy:

/var/log/cluster/corosync.log:
Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: child_timeout_callback:
dlm_monitor_30000 process (PID 11816) timed out
Jul 20 00:21:56 [12204] ha-idg-1 lrmd: warning: operation_finished:
dlm_monitor_30000:11816 - timed out after 20000ms
Jul 20 00:21:56 [32512] ha-idg-1 crmd: error: process_lrm_event: Result of
monitor operation for dlm on ha-idg-1: Timed Out | call=1255
key=dlm_monitor_30000 timeout=20000ms
Jul 20 00:21:56 [32512] ha-idg-1 crmd: info: exec_alert_list: Sending resource
alert via smtp_alert to informatic.idg at helmholtz-muenchen.de

/var/log/messages:
2022-07-20T00:21:56.644677+02:00 ha-idg-1 Cluster: alert_smtp.sh
2022-07-20T00:22:16.076936+02:00 ha-idg-1 kernel: [2366794.757496] dlm:
FD5D3C7CE9104CF5916A84DA0DBED302: leaving the lockspace group...
2022-07-20T00:22:16.364971+02:00 ha-idg-1 kernel: [2366795.045657] dlm:
FD5D3C7CE9104CF5916A84DA0DBED302: group event done 0 0
2022-07-20T00:22:16.364982+02:00 ha-idg-1 kernel: [2366795.045777] dlm:
FD5D3C7CE9104CF5916A84DA0DBED302: release_lockspace final free
2022-07-20T00:22:15.533571+02:00 ha-idg-1 Cluster: message repeated 22 times: [
alert_smtp.sh]
2022-07-20T00:22:17.164442+02:00 ha-idg-1 ocfs2_hb_ctl[19106]: ocfs2_hb_ctl
/sbin/ocfs2_hb_ctl -K -u FD5D3C7CE9104CF5916A84DA0DBED302
2022-07-20T00:22:18.904936+02:00 ha-idg-1 kernel: [2366797.586278] ocfs2:
Unmounting device (254,24) on (node 1084777482)
2022-07-20T00:22:19.116701+02:00 ha-idg-1 Cluster: alert_smtp.sh

What do these kernel messages mean ? Why stopped DLM ? I think this is the
second time this happened. It is really a show stopper because node is fenced
some minutes later:
00:34:40.709002 ha-idg: Fencing Operation Off of ha-idg-1 by ha-idg-2 for
crmd.28253 at ha-idg-2: OK (ref=9710f0e2-a9a9-42c3-a294-ed0bd78bba1a)

What can i do ? Is there an alternative DLM ?
System is SLES 12 SP5. Update to SLES 15 SP3 ?

Bernd



--
Bernd Lentes
System Administrator
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.lentes at helmholtz-muenchen.de
phone: +49 89 3187 1241
+49 89 3187 49123
fax: +49 89 3187 2294
http://www.helmholtz-muenchen.de/mcd

Public key:
30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 6c
3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82 fc cc
96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3 a7 48
f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92 67 9e
ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53 89 70
51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2 e3 22
f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2 fa 27
ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2 67 53
e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85 08 d0
6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac 58 f1
38 43 0e 72 af 02 03 01 00 01



More information about the Users mailing list