[ClusterLabs] GFS2 problem after host name change

Mon Jan 15 10:35:28 EST 2018

On 2018-01-15 05:45 AM, Bob Peterson wrote:
> ----- Original Message -----
> | I recently changed the host name of a cluster. It may or may not be
> | related, but after I noticed that I can cleanly start gfs2 when the node
> | boots. However, if the node is withdrawn and then I try to rejoin it
> | without a reboot, it hangs with this in syslog;
> (snip)
> | gfs2_consist_inode_i+0x5d/0x60 [gfs2]
> | find_good_lh+0x76/0x90 [gfs2]
> | gfs2_find_jhead+0x89/0x170 [gfs2]
> 
> Hi,
> 
> Hm. That is weird. And highly disturbing.
> 
> It indicates GFS2 journal recovery found file system corruption within
> the journal. It's possible that gfs_controld did something weird with the
> GFS2 journal assignments because of the group membership change that
> resulted from the node rename. But that should still not cause any kind
> of journal corruption, since the journals are still policed by the glocks,
> and even recovery and mount should lock each other out.
> 
> I'd be interested if we can find a reproducer for this.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems

I will see if I can reproduce later this week. Note that I change the
domain portion of the hostname, not the short host name. (ie:
kp-a10n01.example.org -> kp-a10n01.foo.org).

As I mentioned in my reply to Ukrich, I ran fsck.gfs2 after a fresh
boot, before starting gfs2, and it found no issues. Note also that this
error persisted after reformatting the partition, and after
deleting/recreating (with a different size) LV. Whatever broke, broke deep.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould