[ClusterLabs] Fedora 31 - systemd based resources don't start

Wed Feb 19 12:21:12 EST 2020

How is it possible that pacemaker is reporting that takes 4.2 minutes
(254930ms) to execute the start of httpd systemd unit?

Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (log_execute)     info:
executing - rsc:apache action:start call_id:25
Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (systemd_unit_exec)    
debug: Performing asynchronous start op on systemd unit httpd named 'apache'
Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
(systemd_unit_exec_with_unit)     debug: Calling StartUnit for apache:
/org/freedesktop/systemd1/unit/httpd_2eservice
Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (action_complete)    
notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
remaining=-154930ms)
Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (log_finished)    
debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
exec-time:254935ms queue-time:235ms

Starting manually works fine and fast:

# time systemctl start httpd
real    0m0.144s
user    0m0.005s
sys    0m0.008s

On 17/02/2020 22:47, Mvrk wrote:
> In attachment the pacemaker.log. On the log i can see that the cluster
> tries to start, the start fails, then tries to stop, and the stop also
> fails also.
>
> One more thing, my cluster was working fine on Fedora 28, i started
> having this problem after upgrade to Fedora 31.
>
> On 17/02/2020 21:30, Ricardo Esteves wrote:
>> Hi,
>>
>> Yes, i also don't understand why is trying to stop them first.
>>
>> SELinux is disabled:
>>
>> # getenforce
>> Disabled
>>
>> All systemd services controlled by the cluster are disabled from
>> starting at boot:
>>
>> # systemctl is-enabled httpd
>> disabled
>>
>> # systemctl is-enabled openvpn-server at 01-server
>> disabled
>>
>>
>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
>>>> Hi,
>>>>
>>>> When i start my cluster, most of my systemd resources won't start:
>>>>
>>>> Failed Resource Actions:
>>>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>> These show that attempts to stop failed, rather than start.
>>>
>>>> So everytime i reboot my node, i need to start the resources manually
>>>> using systemd, for example:
>>>>
>>>> systemd start apache
>>>>
>>>> and then pcs resource cleanup
>>>>
>>>> Resources configuration:
>>>>
>>>> Clone: apache-clone
>>>>   Meta Attrs: maintenance=false
>>>>   Resource: apache (class=systemd type=httpd)
>>>>    Meta Attrs: maintenance=false
>>>>    Operations: monitor interval=60 timeout=100 (apache-monitor-
>>>> interval-60)
>>>>                start interval=0s timeout=100 (apache-start-interval-
>>>> 0s)
>>>>                stop interval=0s timeout=100 (apache-stop-interval-0s)
>>>>
>>>>
>>>>
>>>> Resource: openvpn (class=systemd type=openvpn-server at 01-server)
>>>>    Meta Attrs: maintenance=false
>>>>    Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>>>> interval-60)
>>>>                start interval=0s timeout=100 (openvpn-start-interval-
>>>> 0s)
>>>>                stop interval=0s timeout=100 (openvpn-stop-interval-
>>>> 0s)
>>>>
>>>>
>>>>
>>>> Btw, if i try a debug-start / debug-stop the mentioned resources
>>>> start and stop ok.
>>> Based on that, my first guess would be SELinux. Check the SELinux logs
>>> for denials.
>>>
>>> Also, make sure your systemd services are not enabled in systemd itself
>>> (e.g. via systemctl enable). Clustered systemd services should be
>>> managed by the cluster only.