[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Jul 9 03:56:43 EDT 2021
Hi!
An update on the issue:
SUSE support found out that the reason for the hanging processes is a deadlock caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on a fix.
Today the cluster "fixed" the problem in an unusual way:
h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB
I doubt that was the best possible choice ;-)
The dead corosync caused the DC (h18) to fence h19 (which was successful), but the DC was fenced while it tried to recover resources, so the complete cluster rebooted.
Regards,
Ulrich
More information about the Users
mailing list