[ClusterLabs] Fedora 31 - systemd based resources don't start

Mon Mar 30 22:39:39 EDT 2020

On Wed, 2020-02-19 at 18:21 +0100, Maverick wrote:
> How is it possible that pacemaker is reporting that takes 4.2 minutes
> (254930ms) to execute the start of httpd systemd unit?

Sorry I didn't get a chance to look into this sooner.

Fedora 31 introduced a change where the ftime() call that pacemaker had
been using for operation timing was no longer available. We implemented
clock_gettime()-based timing in a rush because it happened right before
the release of 2.0.3. We enabled that code only for systems like Fedora
31 that didn't support ftime().

The clock_gettime()-based code turned out to have a bug that was
recently fixed. The fixes will be in 2.0.4 (the first release candidate
should come out in a couple of weeks) which will then be packaged for
Fedora 31 and 32.

> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (log_execute)    
> info:
> executing - rsc:apache action:start call_id:25
> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (systemd_unit_exec)
>    
> debug: Performing asynchronous start op on systemd unit httpd named
> 'apache'
> Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
> apache:
> /org/freedesktop/systemd1/unit/httpd_2eservice
> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (action_complete)
>    
> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
> remaining=-154930ms)
> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (log_finished)    
> debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
> exec-time:254935ms queue-time:235ms
> 
> 
> Starting manually works fine and fast:
> 
> # time systemctl start httpd
> real    0m0.144s
> user    0m0.005s
> sys    0m0.008s
> 
> 
> On 17/02/2020 22:47, Mvrk wrote:
> > In attachment the pacemaker.log. On the log i can see that the
> > cluster
> > tries to start, the start fails, then tries to stop, and the stop
> > also
> > fails also.
> > 
> > One more thing, my cluster was working fine on Fedora 28, i started
> > having this problem after upgrade to Fedora 31.
> > 
> > On 17/02/2020 21:30, Ricardo Esteves wrote:
> > > Hi,
> > > 
> > > Yes, i also don't understand why is trying to stop them first.
> > > 
> > > SELinux is disabled:
> > > 
> > > # getenforce
> > > Disabled
> > > 
> > > All systemd services controlled by the cluster are disabled from
> > > starting at boot:
> > > 
> > > # systemctl is-enabled httpd
> > > disabled
> > > 
> > > # systemctl is-enabled openvpn-server at 01-server
> > > disabled
> > > 
> > > 
> > > On 17/02/2020 20:28, Ken Gaillot wrote:
> > > > On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
> > > > > Hi,
> > > > > 
> > > > > When i start my cluster, most of my systemd resources won't
> > > > > start:
> > > > > 
> > > > > Failed Resource Actions:
> > > > >   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
> > > > > status='Timed Out', exitreason='', last-rc-change='1970-01-01
> > > > > 01:00:54 +01:00', queued=29ms, exec=197799ms
> > > > >   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
> > > > > status='Timed Out', exitreason='', last-rc-change='1970-01-01
> > > > > 01:00:54 +01:00', queued=1805ms, exec=198841ms
> > > > 
> > > > These show that attempts to stop failed, rather than start.
> > > > 
> > > > > So everytime i reboot my node, i need to start the resources
> > > > > manually
> > > > > using systemd, for example:
> > > > > 
> > > > > systemd start apache
> > > > > 
> > > > > and then pcs resource cleanup
> > > > > 
> > > > > Resources configuration:
> > > > > 
> > > > > Clone: apache-clone
> > > > >   Meta Attrs: maintenance=false
> > > > >   Resource: apache (class=systemd type=httpd)
> > > > >    Meta Attrs: maintenance=false
> > > > >    Operations: monitor interval=60 timeout=100 (apache-
> > > > > monitor-
> > > > > interval-60)
> > > > >                start interval=0s timeout=100 (apache-start-
> > > > > interval-
> > > > > 0s)
> > > > >                stop interval=0s timeout=100 (apache-stop-
> > > > > interval-0s)
> > > > > 
> > > > > 
> > > > > 
> > > > > Resource: openvpn (class=systemd 
> > > > > type=openvpn-server at 01-server)
> > > > >    Meta Attrs: maintenance=false
> > > > >    Operations: monitor interval=60 timeout=100 (openvpn-
> > > > > monitor-
> > > > > interval-60)
> > > > >                start interval=0s timeout=100 (openvpn-start-
> > > > > interval-
> > > > > 0s)
> > > > >                stop interval=0s timeout=100 (openvpn-stop-
> > > > > interval-
> > > > > 0s)
> > > > > 
> > > > > 
> > > > > 
> > > > > Btw, if i try a debug-start / debug-stop the mentioned
> > > > > resources
> > > > > start and stop ok.
> > > > 
> > > > Based on that, my first guess would be SELinux. Check the
> > > > SELinux logs
> > > > for denials.
> > > > 
> > > > Also, make sure your systemd services are not enabled in
> > > > systemd itself
> > > > (e.g. via systemctl enable). Clustered systemd services should
> > > > be
> > > > managed by the cluster only.
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>