[ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency error

Mon Oct 21 22:36:24 EDT 2019

Hi Bob,

> -----Original Message-----
> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Bob
> Peterson
> Sent: 2019年10月21日 21:02
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] gfs2: fsid=xxxx:work.3: fatal: filesystem consistency
> error
> 
> ----- Original Message -----
> > Hello List,
> >
> > I got gfs2 file system consistency error from one user, who is using
> > kernel 4.12.14-95.29-default on SLE12SP4(x86_64).
> > The error message is as below,
> > 2019-09-26T10:22:10.333792+02:00 node4 kernel: [ 3456.176234] gfs2:
> > fsid=xxxx:work.3: fatal: filesystem consistency error
> > 2019-09-26T10:22:10.333806+02:00 node4 kernel: [ 3456.176234]
> inode = 280
> > 342097926
> > 2019-09-26T10:22:10.333807+02:00 node4 kernel: [ 3456.176234]
> function =
> > gfs2_dinode_dealloc, file = ../fs/gfs2/super.c, line = 1459
> > 2019-09-26T10:22:10.333808+02:00 node4 kernel: [ 3456.176235] gfs2:
> > fsid=xxxx:work.3: about to withdraw this file system
> >
> > I cat the super.c file, the related code is,
> > 1451 static int gfs2_dinode_dealloc(struct gfs2_inode *ip)
> > 1452 {
> > 1453         struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
> > 1454         struct gfs2_rgrpd *rgd;
> > 1455         struct gfs2_holder gh;
> > 1456         int error;
> > 1457
> > 1458         if (gfs2_get_inode_blocks(&ip->i_inode) != 1) {
> > 1459                 gfs2_consist_inode(ip);   <<== here
> > 1460                 return -EIO;
> > 1461         }
> >
> >
> > It looks the upstream has fixed this bug? who can help to point out
> > which patches to be needed for back-port?
> >
> > Thanks
> > Gang
> 
> Hi,
> 
> Yes, we have made lots of patches since the 4.12 kernel, some of which may
> be relevant. However, that error often indicates file system corruption.
> (It means the block count for a dinode became corrupt.)
> 
> I've been working on a set of problems caused whenever gfs2 replays one of
> its journals during recovery, with a wide variety of symptoms, including that
> one. So it might be one of those. Some of my resulting patches are already
> pushed to upstream, but I'm not yet at the point where I can push them all.
> 
> I recommend doing a fsck.gfs2 on the volume to ensure consistency.

The customer has repaired it using fsck.gfs2, however every time the application workload starts (concurrent writing), 
the filesystem becomes inaccessible, causing also a stop operation failure of the app resource, consequently causing a fence.
Do you have any suggestion in this case? It looks there is a serious bug in case concurrent writing with some stress.

Thanks
Gang 

> 
> Regards,
> 
> Bob Peterson
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/