[ClusterLabs] Antw: [EXT] Automatic restart of Pacemaker after reboot and filesystem unmount problem
Grégory Sacré
gregory.sacre at s-clinica.com
Wed Jul 15 04:08:25 EDT 2020
Hello!
> I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured?
I have setup ssh fencing just for testing purposes.
Kr,
Gregory
-----Original Message-----
From: Users <users-bounces at clusterlabs.org> On Behalf Of Ulrich Windl
Sent: 15 July 2020 09:26
To: users at clusterlabs.org
Subject: [ClusterLabs] Antw: [EXT] Automatic restart of Pacemaker after reboot and filesystem unmount problem
Hi!
I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured?
>>> Grégory Sacré <gregory.sacre at s-clinica.com> schrieb am 14.07.2020 um
>>> 13:56
in
Nachricht
<FE83686A028367468065016AC856B02107B7D745 at IAS-EX10-01.S-Clinica.int>:
> Dear all,
>
>
> I'm pretty new to Pacemaker so I must be missing something but I
> cannot find
> it in the documentation.
>
> I'm setting up a SAMBA File Server cluster with DRBD and Pacemaker.
> Here are
> the relevant pcs commands related to the mount part:
>
> user $ sudo pcs cluster cib fs_cfg
> user $ sudo pcs ‑f fs_cfg resource create VPSFSMount Filesystem
> device="/dev/drbd1" directory="/srv/vps‑fs" fstype="gfs2"
> "options=acl,noatime"
> Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from
> 'Filesystem')
>
> It all works fine, here is an extract of the pcs status command:
>
> user $ sudo pcs status
> Cluster name: vps‑fs
> Stack: corosync
> Current DC: vps‑fs‑04 (version 1.1.18‑2b07d5c5a9) ‑ partition with
> quorum Last updated: Tue Jul 14 11:13:55 2020 Last change: Tue Jul 14
> 10:31:36 2020 by root via cibadmin on vps‑fs‑04
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ vps‑fs‑03 vps‑fs‑04 ]
>
> Full list of resources:
>
> stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑04 Clone Set:
> dlm‑clone [dlm]
> Started: [ vps‑fs‑03 vps‑fs‑04 ]
> Master/Slave Set: VPSFSClone [VPSFS]
> Masters: [ vps‑fs‑03 vps‑fs‑04 ]
> Clone Set: VPSFSMount‑clone [VPSFSMount]
> Started: [ vps‑fs‑03 vps‑fs‑04 ]
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
> I can start CTDB (SAMBA cluster manager manually) and it's fine.
> However, CTDB shares a lock file between both nodes which is located
> on the shared mount point.
>
> The problem comes from the moment I reboot one of the servers
> (vps‑fs‑04)
and
> Pacemaker (and Corosync) are started automatically upon boot (I'm
> talking about unexpected reboot, not maintenance reboot which I didn't try yet).
> After reboot, the server (vps‑fs‑04) comes back online and in the
> cluster
but
> the one that wasn't rebooted has an issue with the mount resource:
>
> user $ sudo pcs status
> Cluster name: vps‑fs
> Stack: corosync
> Current DC: vps‑fs‑03 (version 1.1.18‑2b07d5c5a9) ‑ partition with
> quorum Last updated: Tue Jul 14 11:33:44 2020 Last change: Tue Jul 14
> 10:31:36 2020 by root via cibadmin on vps‑fs‑04
>
> 2 nodes configured
> 7 resources configured
>
> Node vps‑fs‑03: UNCLEAN (online)
> Online: [ vps‑fs‑04 ]
>
> Full list of resources:
>
> stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑03 Clone Set:
> dlm‑clone [dlm]
> Started: [ vps‑fs‑03 vps‑fs‑04 ]
> Master/Slave Set: VPSFSClone [VPSFS]
> Masters: [ vps‑fs‑03 ]
> Slaves: [ vps‑fs‑04 ]
> Clone Set: VPSFSMount‑clone [VPSFSMount]
> VPSFSMount (ocf::heartbeat:Filesystem): FAILED vps‑fs‑03
> Stopped: [ vps‑fs‑04 ]
>
> Failed Actions:
> * VPSFSMount_stop_0 on vps‑fs‑03 'unknown error' (1): call=65,
> status=Timed
> Out, exitreason='Couldn't unmount /srv/vps‑fs; trying cleanup with KILL',
> last‑rc‑change='Tue Jul 14 11:23:46 2020', queued=0ms,
> exec=60011ms
>
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
> The problem seems to come from the fact that the mount point
> (/srv/vps‑fs)
is
> busy (probably CTDB lock file) but what I don't understand is why does
> the server not rebooted (vps‑fs‑03) need to remount an already mounted
> file
system
> when the other node comes back online.
>
> I've checked the 'ocf:heartbeat:Filesystem' documentation but nothing
> seemed
> to help. The only thing I did was to change the following:
>
> user $ sudo pcs resource update VPSFSMount fast_stop="no" op monitor
> timeout="60"
>
> However this didn't help. Google doesn't give me much help either (but
> maybe
> I'm not searching for the right thing).
>
> Thank you in advance for any pointer!
>
>
> Kr,
>
> Gregory
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list