[ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error
Gang He
GHe at suse.com
Mon Oct 21 22:36:24 EDT 2019
Hi Bob,
> -----Original Message-----
> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Bob
> Peterson
> Sent: 2019年10月21日 21:02
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency
> error
>
> ----- Original Message -----
> > Hello List,
> >
> > I got gfs2 file system consistency error from one user, who is using
> > kernel 4.12.14-95.29-default on SLE12SP4(x86_64).
> > The error message is as below,
> > 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2:
> > fsid=xxxx:work.3: fatal: filesystem consistency error
> > 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234]
> inode = 280
> > 342097926
> > 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234]
> function =
> > gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459
> > 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2:
> > fsid=xxxx:work.3: about to withdraw this file system
> >
> > I cat the super.c file, the related code is,
> > 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
> > 1452 {
> > 1453 struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
> > 1454 struct gfs2_rgrpd *rgd;
> > 1455 struct gfs2_holder gh;
> > 1456 int error;
> > 1457
> > 1458 if (gfs2_get_inode_blocks(&ip->i_inode) != 1) {
> > 1459 gfs2_consist_inode(ip); <<== here
> > 1460 return -EIO;
> > 1461 }
> >
> >
> > It looks the upstream has fixed this bug? who can help to point out
> > which patches to be needed for back-port?
> >
> > Thanks
> > Gang
>
> Hi,
>
> Yes, we have made lots of patches since the 4.12 kernel, some of which may
> be relevant. However, that error often indicates file system corruption.
> (It means the block count for a dinode became corrupt.)
>
> I've been working on a set of problems caused whenever gfs2 replays one of
> its journals during recovery, with a wide variety of symptoms, including that
> one. So it might be one of those. Some of my resulting patches are already
> pushed to upstream, but I'm not yet at the point where I can push them all.
>
> I recommend doing a fsck.gfs2 on the volume to ensure consistency.
The customer has repaired it using fsck.gfs2, however every time the application workload starts (concurrent writing),
the filesystem becomes inaccessible, causing also a stop operation failure of the app resource, consequently causing a fence.
Do you have any suggestion in this case? It looks there is a serious bug in case concurrent writing with some stress.
Thanks
Gang
>
> Regards,
>
> Bob Peterson
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list