[ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
GHe at suse.com
Sun Jul 11 04:55:30 EDT 2021
Thank for your update.
Based on some feedback from the upstream, there is a patch (ocfs2: initialize ip_next_orphan), which should fix this problem.
I can comfirm the patch looks very similar with your problem.
I will verify it next week, then let you know the result.
From: Users <users-bounces at clusterlabs.org> on behalf of Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
Sent: Friday, July 9, 2021 15:56
To: users at clusterlabs.org
Subject: [ClusterLabs] Antw: [EXT] Re: Antw: Hanging OCFS2 Filesystem any one else?
An update on the issue:
SUSE support found out that the reason for the hanging processes is a deadlock caused by a race condition (Kernel 5.3.18-24.64-default). Support is working on a fix.
Today the cluster "fixed" the problem in an unusual way:
h19 kernel: Out of memory: Killed process 6838 (corosync) total-vm:261212kB, anon-rss:31444kB, file-rss:7700kB, shmem-rss:121872kB
I doubt that was the best possible choice ;-)
The dead corosync caused the DC (h18) to fence h19 (which was successful), but the DC was fenced while it tried to recover resources, so the complete cluster rebooted.
Manage your subscription:
ClusterLabs home: https://www.clusterlabs.org/
More information about the Users