[ClusterLabs] Antw: [EXT] Automatic restart of Pacemaker after reboot and filesystem unmount problem

Wed Jul 15 04:08:25 EDT 2020

Hello!

> I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured?

I have setup ssh fencing just for testing purposes.

Kr,

Gregory

-----Original Message-----
From: Users <users-bounces at clusterlabs.org> On Behalf Of Ulrich Windl
Sent: 15 July 2020 09:26
To: users at clusterlabs.org
Subject: [ClusterLabs] Antw: [EXT] Automatic restart of Pacemaker after reboot and filesystem unmount problem

Hi!

I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured?

>>> Grégory Sacré <gregory.sacre at s-clinica.com> schrieb am 14.07.2020 um 
>>> 13:56
in
Nachricht
<FE83686A028367468065016AC856B02107B7D745 at IAS-EX10-01.S-Clinica.int>:
> Dear all,
> 
> 
> I'm pretty new to Pacemaker so I must be missing something but I 
> cannot find

> it in the documentation.
> 
> I'm setting up a SAMBA File Server cluster with DRBD and Pacemaker. 
> Here are

> the relevant pcs commands related to the mount part:
> 
> user $ sudo pcs cluster cib fs_cfg
> user $ sudo pcs ‑f fs_cfg resource create VPSFSMount Filesystem 
> device="/dev/drbd1" directory="/srv/vps‑fs" fstype="gfs2"
> "options=acl,noatime"
>   Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 
> 'Filesystem')
> 
> It all works fine, here is an extract of the pcs status command:
> 
> user $ sudo pcs status
> Cluster name: vps‑fs
> Stack: corosync
> Current DC: vps‑fs‑04 (version 1.1.18‑2b07d5c5a9) ‑ partition with 
> quorum Last updated: Tue Jul 14 11:13:55 2020 Last change: Tue Jul 14 
> 10:31:36 2020 by root via cibadmin on vps‑fs‑04
> 
> 2 nodes configured
> 7 resources configured
> 
> Online: [ vps‑fs‑03 vps‑fs‑04 ]
> 
> Full list of resources:
> 
> stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑04 Clone Set: 
> dlm‑clone [dlm]
>      Started: [ vps‑fs‑03 vps‑fs‑04 ]
> Master/Slave Set: VPSFSClone [VPSFS]
>      Masters: [ vps‑fs‑03 vps‑fs‑04 ]
> Clone Set: VPSFSMount‑clone [VPSFSMount]
>      Started: [ vps‑fs‑03 vps‑fs‑04 ]
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> I can start CTDB (SAMBA cluster manager manually) and it's fine. 
> However, CTDB shares a lock file between both nodes which is located 
> on the shared mount point.
> 
> The problem comes from the moment I reboot one of the servers 
> (vps‑fs‑04)
and 
> Pacemaker (and Corosync) are started automatically upon boot (I'm 
> talking about unexpected reboot, not maintenance reboot which I didn't try yet).
> After reboot, the server (vps‑fs‑04) comes back online and in the 
> cluster
but 
> the one that wasn't rebooted has an issue with the mount resource:
> 
> user $ sudo pcs status
> Cluster name: vps‑fs
> Stack: corosync
> Current DC: vps‑fs‑03 (version 1.1.18‑2b07d5c5a9) ‑ partition with 
> quorum Last updated: Tue Jul 14 11:33:44 2020 Last change: Tue Jul 14 
> 10:31:36 2020 by root via cibadmin on vps‑fs‑04
> 
> 2 nodes configured
> 7 resources configured
> 
> Node vps‑fs‑03: UNCLEAN (online)
> Online: [ vps‑fs‑04 ]
> 
> Full list of resources:
> 
> stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑03 Clone Set: 
> dlm‑clone [dlm]
>      Started: [ vps‑fs‑03 vps‑fs‑04 ]
> Master/Slave Set: VPSFSClone [VPSFS]
>      Masters: [ vps‑fs‑03 ]
>      Slaves: [ vps‑fs‑04 ]
> Clone Set: VPSFSMount‑clone [VPSFSMount]
>      VPSFSMount (ocf::heartbeat:Filesystem):    FAILED vps‑fs‑03
>      Stopped: [ vps‑fs‑04 ]
> 
> Failed Actions:
> * VPSFSMount_stop_0 on vps‑fs‑03 'unknown error' (1): call=65, 
> status=Timed

> Out, exitreason='Couldn't unmount /srv/vps‑fs; trying cleanup with KILL',
>     last‑rc‑change='Tue Jul 14 11:23:46 2020', queued=0ms, 
> exec=60011ms
> 
> 
> Daemon Status:
>   corosync: active/enabled
>  pacemaker: active/enabled
>   pcsd: active/enabled
> 
> The problem seems to come from the fact that the mount point 
> (/srv/vps‑fs)
is 
> busy (probably CTDB lock file) but what I don't understand is why does 
> the server not rebooted (vps‑fs‑03) need to remount an already mounted 
> file
system 
> when the other node comes back online.
> 
> I've checked the 'ocf:heartbeat:Filesystem' documentation but nothing 
> seemed

> to help. The only thing I did was to change the following:
> 
> user $ sudo pcs resource update VPSFSMount fast_stop="no" op monitor 
> timeout="60"
> 
> However this didn't help. Google doesn't give me much help either (but 
> maybe

> I'm not searching for the right thing).
> 
> Thank you in advance for any pointer!
> 
> 
> Kr,
> 
> Gregory

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/