[ClusterLabs] Fedora 31 - systemd based resources don't start
Ken Gaillot
kgaillot at redhat.com
Mon Mar 30 22:39:39 EDT 2020
On Wed, 2020-02-19 at 18:21 +0100, Maverick wrote:
> How is it possible that pacemaker is reporting that takes 4.2 minutes
> (254930ms) to execute the start of httpd systemd unit?
Sorry I didn't get a chance to look into this sooner.
Fedora 31 introduced a change where the ftime() call that pacemaker had
been using for operation timing was no longer available. We implemented
clock_gettime()-based timing in a rush because it happened right before
the release of 2.0.3. We enabled that code only for systems like Fedora
31 that didn't support ftime().
The clock_gettime()-based code turned out to have a bug that was
recently fixed. The fixes will be in 2.0.4 (the first release candidate
should come out in a couple of weeks) which will then be packaged for
Fedora 31 and 32.
> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (log_execute)
> info:
> executing - rsc:apache action:start call_id:25
> Feb 19 17:04:09 boss1 pacemaker-execd [1514] (systemd_unit_exec)
>
> debug: Performing asynchronous start op on systemd unit httpd named
> 'apache'
> Feb 19 17:04:09 boss1 pacemaker-execd [1514]
> (systemd_unit_exec_with_unit) debug: Calling StartUnit for
> apache:
> /org/freedesktop/systemd1/unit/httpd_2eservice
> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (action_complete)
>
> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
> remaining=-154930ms)
> Feb 19 17:04:10 boss1 pacemaker-execd [1514] (log_finished)
> debug: finished - rsc:apache action:monitor call_id:25 exit-code:198
> exec-time:254935ms queue-time:235ms
>
>
> Starting manually works fine and fast:
>
> # time systemctl start httpd
> real 0m0.144s
> user 0m0.005s
> sys 0m0.008s
>
>
> On 17/02/2020 22:47, Mvrk wrote:
> > In attachment the pacemaker.log. On the log i can see that the
> > cluster
> > tries to start, the start fails, then tries to stop, and the stop
> > also
> > fails also.
> >
> > One more thing, my cluster was working fine on Fedora 28, i started
> > having this problem after upgrade to Fedora 31.
> >
> > On 17/02/2020 21:30, Ricardo Esteves wrote:
> > > Hi,
> > >
> > > Yes, i also don't understand why is trying to stop them first.
> > >
> > > SELinux is disabled:
> > >
> > > # getenforce
> > > Disabled
> > >
> > > All systemd services controlled by the cluster are disabled from
> > > starting at boot:
> > >
> > > # systemctl is-enabled httpd
> > > disabled
> > >
> > > # systemctl is-enabled openvpn-server at 01-server
> > > disabled
> > >
> > >
> > > On 17/02/2020 20:28, Ken Gaillot wrote:
> > > > On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
> > > > > Hi,
> > > > >
> > > > > When i start my cluster, most of my systemd resources won't
> > > > > start:
> > > > >
> > > > > Failed Resource Actions:
> > > > > * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
> > > > > status='Timed Out', exitreason='', last-rc-change='1970-01-01
> > > > > 01:00:54 +01:00', queued=29ms, exec=197799ms
> > > > > * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
> > > > > status='Timed Out', exitreason='', last-rc-change='1970-01-01
> > > > > 01:00:54 +01:00', queued=1805ms, exec=198841ms
> > > >
> > > > These show that attempts to stop failed, rather than start.
> > > >
> > > > > So everytime i reboot my node, i need to start the resources
> > > > > manually
> > > > > using systemd, for example:
> > > > >
> > > > > systemd start apache
> > > > >
> > > > > and then pcs resource cleanup
> > > > >
> > > > > Resources configuration:
> > > > >
> > > > > Clone: apache-clone
> > > > > Meta Attrs: maintenance=false
> > > > > Resource: apache (class=systemd type=httpd)
> > > > > Meta Attrs: maintenance=false
> > > > > Operations: monitor interval=60 timeout=100 (apache-
> > > > > monitor-
> > > > > interval-60)
> > > > > start interval=0s timeout=100 (apache-start-
> > > > > interval-
> > > > > 0s)
> > > > > stop interval=0s timeout=100 (apache-stop-
> > > > > interval-0s)
> > > > >
> > > > >
> > > > >
> > > > > Resource: openvpn (class=systemd
> > > > > type=openvpn-server at 01-server)
> > > > > Meta Attrs: maintenance=false
> > > > > Operations: monitor interval=60 timeout=100 (openvpn-
> > > > > monitor-
> > > > > interval-60)
> > > > > start interval=0s timeout=100 (openvpn-start-
> > > > > interval-
> > > > > 0s)
> > > > > stop interval=0s timeout=100 (openvpn-stop-
> > > > > interval-
> > > > > 0s)
> > > > >
> > > > >
> > > > >
> > > > > Btw, if i try a debug-start / debug-stop the mentioned
> > > > > resources
> > > > > start and stop ok.
> > > >
> > > > Based on that, my first guess would be SELinux. Check the
> > > > SELinux logs
> > > > for denials.
> > > >
> > > > Also, make sure your systemd services are not enabled in
> > > > systemd itself
> > > > (e.g. via systemctl enable). Clustered systemd services should
> > > > be
> > > > managed by the cluster only.
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list