[ClusterLabs] Fedora 31 - systemd based resources don't start

Thu Feb 20 05:49:43 EST 2020

> You really need to debug the start & stop of  tthe resource .
>
> Please try the debug procedure  and provide the output:
> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>
> Best Regards,
> Strahil Nikolov

Hi,

Correct me if i'm wrong, but i think that procedure doesn't work for
systemd class resources, i don't know which OCF script is responsible
for handling systemd class resources.

Also crm command doesn't exist in RHEL/Fedora, i've seen the crm command
only in SUSE.

On 19/02/2020 19:23, Strahil Nikolov wrote:
> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick <mvrk at sapo.pt> wrote:
>> How is it possible that pacemaker is reporting that takes 4.2 minutes
>> (254930ms) to execute the start of httpd systemd unit?
>>
>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (log_execute)    
>> info:
>> executing - rsc:apache action:start call_id:25
>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (systemd_unit_exec)
>>    
>> debug: Performing asynchronous start op on systemd unit httpd named
>> 'apache'
>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for apache:
>> /org/freedesktop/systemd1/unit/httpd_2eservice
>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (action_complete)    
>> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
>> remaining=-154930ms)
>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (log_finished)    
>> debug: finished - rsc:apache action:monitor call_id:25  exit-code:198
>> exec-time:254935ms queue-time:235ms
>>
>>
>> Starting manually works fine and fast:
>>
>> # time systemctl start httpd
>> real    0m0.144s
>> user    0m0.005s
>> sys    0m0.008s
>>
>>
>> On 17/02/2020 22:47, Mvrk wrote:
>>> In attachment the pacemaker.log. On the log i can see that the
>> cluster
>>> tries to start, the start fails, then tries to stop, and the stop
>> also
>>> fails also.
>>>
>>> One more thing, my cluster was working fine on Fedora 28, i started
>>> having this problem after upgrade to Fedora 31.
>>>
>>> On 17/02/2020 21:30, Ricardo Esteves wrote:
>>>> Hi,
>>>>
>>>> Yes, i also don't understand why is trying to stop them first.
>>>>
>>>> SELinux is disabled:
>>>>
>>>> # getenforce
>>>> Disabled
>>>>
>>>> All systemd services controlled by the cluster are disabled from
>>>> starting at boot:
>>>>
>>>> # systemctl is-enabled httpd
>>>> disabled
>>>>
>>>> # systemctl is-enabled openvpn-server at 01-server
>>>> disabled
>>>>
>>>>
>>>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
>>>>>> Hi,
>>>>>>
>>>>>> When i start my cluster, most of my systemd resources won't start:
>>>>>>
>>>>>> Failed Resource Actions:
>>>>>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>>>>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>>>> These show that attempts to stop failed, rather than start.
>>>>>
>>>>>> So everytime i reboot my node, i need to start the resources
>> manually
>>>>>> using systemd, for example:
>>>>>>
>>>>>> systemd start apache
>>>>>>
>>>>>> and then pcs resource cleanup
>>>>>>
>>>>>> Resources configuration:
>>>>>>
>>>>>> Clone: apache-clone
>>>>>>   Meta Attrs: maintenance=false
>>>>>>   Resource: apache (class=systemd type=httpd)
>>>>>>    Meta Attrs: maintenance=false
>>>>>>    Operations: monitor interval=60 timeout=100 (apache-monitor-
>>>>>> interval-60)
>>>>>>                start interval=0s timeout=100
>> (apache-start-interval-
>>>>>> 0s)
>>>>>>                stop interval=0s timeout=100
>> (apache-stop-interval-0s)
>>>>>>
>>>>>>
>>>>>> Resource: openvpn (class=systemd type=openvpn-server at 01-server)
>>>>>>    Meta Attrs: maintenance=false
>>>>>>    Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>>>>>> interval-60)
>>>>>>                start interval=0s timeout=100
>> (openvpn-start-interval-
>>>>>> 0s)
>>>>>>                stop interval=0s timeout=100
>> (openvpn-stop-interval-
>>>>>> 0s)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Btw, if i try a debug-start / debug-stop the mentioned resources
>>>>>> start and stop ok.
>>>>> Based on that, my first guess would be SELinux. Check the SELinux
>> logs
>>>>> for denials.
>>>>>
>>>>> Also, make sure your systemd services are not enabled in systemd
>> itself
>>>>> (e.g. via systemctl enable). Clustered systemd services should be
>>>>> managed by the cluster only.
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
> You really need to debug the start & stop of  tthe resource .
>
> Please try the debug procedure  and provide the output:
> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>
> Best Regards,
> Strahil Nikolov