[ClusterLabs] Antw: [EXT] Automatic restart of Pacemaker after reboot and filesystem unmount problem

Wed Jul 15 03:26:08 EDT 2020

Hi!

I just wonder: Does "UNCLEAN (online)" mean you have no fencing configured?

>>> Grégory Sacré <gregory.sacre at s-clinica.com> schrieb am 14.07.2020 um 13:56
in
Nachricht
<FE83686A028367468065016AC856B02107B7D745 at IAS-EX10-01.S-Clinica.int>:
> Dear all,
> 
> 
> I'm pretty new to Pacemaker so I must be missing something but I cannot find

> it in the documentation.
> 
> I'm setting up a SAMBA File Server cluster with DRBD and Pacemaker. Here are

> the relevant pcs commands related to the mount part:
> 
> user $ sudo pcs cluster cib fs_cfg
> user $ sudo pcs ‑f fs_cfg resource create VPSFSMount Filesystem 
> device="/dev/drbd1" directory="/srv/vps‑fs" fstype="gfs2" 
> "options=acl,noatime"
>   Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')
> 
> It all works fine, here is an extract of the pcs status command:
> 
> user $ sudo pcs status
> Cluster name: vps‑fs
> Stack: corosync
> Current DC: vps‑fs‑04 (version 1.1.18‑2b07d5c5a9) ‑ partition with quorum
> Last updated: Tue Jul 14 11:13:55 2020
> Last change: Tue Jul 14 10:31:36 2020 by root via cibadmin on vps‑fs‑04
> 
> 2 nodes configured
> 7 resources configured
> 
> Online: [ vps‑fs‑03 vps‑fs‑04 ]
> 
> Full list of resources:
> 
> stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑04
> Clone Set: dlm‑clone [dlm]
>      Started: [ vps‑fs‑03 vps‑fs‑04 ]
> Master/Slave Set: VPSFSClone [VPSFS]
>      Masters: [ vps‑fs‑03 vps‑fs‑04 ]
> Clone Set: VPSFSMount‑clone [VPSFSMount]
>      Started: [ vps‑fs‑03 vps‑fs‑04 ]
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> I can start CTDB (SAMBA cluster manager manually) and it's fine. However, 
> CTDB shares a lock file between both nodes which is located on the shared 
> mount point.
> 
> The problem comes from the moment I reboot one of the servers (vps‑fs‑04)
and 
> Pacemaker (and Corosync) are started automatically upon boot (I'm talking 
> about unexpected reboot, not maintenance reboot which I didn't try yet).
> After reboot, the server (vps‑fs‑04) comes back online and in the cluster
but 
> the one that wasn't rebooted has an issue with the mount resource:
> 
> user $ sudo pcs status
> Cluster name: vps‑fs
> Stack: corosync
> Current DC: vps‑fs‑03 (version 1.1.18‑2b07d5c5a9) ‑ partition with quorum
> Last updated: Tue Jul 14 11:33:44 2020
> Last change: Tue Jul 14 10:31:36 2020 by root via cibadmin on vps‑fs‑04
> 
> 2 nodes configured
> 7 resources configured
> 
> Node vps‑fs‑03: UNCLEAN (online)
> Online: [ vps‑fs‑04 ]
> 
> Full list of resources:
> 
> stonith_vps‑fs (stonith:external/ssh): Started vps‑fs‑03
> Clone Set: dlm‑clone [dlm]
>      Started: [ vps‑fs‑03 vps‑fs‑04 ]
> Master/Slave Set: VPSFSClone [VPSFS]
>      Masters: [ vps‑fs‑03 ]
>      Slaves: [ vps‑fs‑04 ]
> Clone Set: VPSFSMount‑clone [VPSFSMount]
>      VPSFSMount (ocf::heartbeat:Filesystem):    FAILED vps‑fs‑03
>      Stopped: [ vps‑fs‑04 ]
> 
> Failed Actions:
> * VPSFSMount_stop_0 on vps‑fs‑03 'unknown error' (1): call=65, status=Timed

> Out, exitreason='Couldn't unmount /srv/vps‑fs; trying cleanup with KILL',
>     last‑rc‑change='Tue Jul 14 11:23:46 2020', queued=0ms, exec=60011ms
> 
> 
> Daemon Status:
>   corosync: active/enabled
>  pacemaker: active/enabled
>   pcsd: active/enabled
> 
> The problem seems to come from the fact that the mount point (/srv/vps‑fs)
is 
> busy (probably CTDB lock file) but what I don't understand is why does the 
> server not rebooted (vps‑fs‑03) need to remount an already mounted file
system 
> when the other node comes back online.
> 
> I've checked the 'ocf:heartbeat:Filesystem' documentation but nothing seemed

> to help. The only thing I did was to change the following:
> 
> user $ sudo pcs resource update VPSFSMount fast_stop="no" op monitor 
> timeout="60"
> 
> However this didn't help. Google doesn't give me much help either (but maybe

> I'm not searching for the right thing).
> 
> Thank you in advance for any pointer!
> 
> 
> Kr,
> 
> Gregory