[ClusterLabs] Fedora 31 - systemd based resources don't start

Thu Feb 20 14:35:07 EST 2020

Manually it starts ok, no problems:

pcs resource debug-start apache --full
(unpack_config)     warning: Blind faith: not fencing unseen nodes
Operation start for apache (systemd::httpd) returned: 'ok' (0)

On 20/02/2020 16:46, Strahil Nikolov wrote:
> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick <mvrk at sapo.pt> wrote:
>>> You really need to debug the start & stop of  tthe resource .
>>>
>>> Please try the debug procedure  and provide the output:
>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>
>> Hi,
>>
>> Correct me if i'm wrong, but i think that procedure doesn't work for
>> systemd class resources, i don't know which OCF script is responsible
>> for handling systemd class resources.
>>
>> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
>> command
>> only in SUSE.
>>
>>
>>
>> On 19/02/2020 19:23, Strahil Nikolov wrote:
>>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick <mvrk at sapo.pt>
>> wrote:
>>>> How is it possible that pacemaker is reporting that takes 4.2
>> minutes
>>>> (254930ms) to execute the start of httpd systemd unit?
>>>>
>>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (log_execute)    
>>>> info:
>>>> executing - rsc:apache action:start call_id:25
>>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (systemd_unit_exec)
>>>>    
>>>> debug: Performing asynchronous start op on systemd unit httpd named
>>>> 'apache'
>>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
>>>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
>> apache:
>>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (action_complete)
>>    
>>>> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
>>>> remaining=-154930ms)
>>>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (log_finished)    
>>>> debug: finished - rsc:apache action:monitor call_id:25 
>> exit-code:198
>>>> exec-time:254935ms queue-time:235ms
>>>>
>>>>
>>>> Starting manually works fine and fast:
>>>>
>>>> # time systemctl start httpd
>>>> real    0m0.144s
>>>> user    0m0.005s
>>>> sys    0m0.008s
>>>>
>>>>
>>>> On 17/02/2020 22:47, Mvrk wrote:
>>>>> In attachment the pacemaker.log. On the log i can see that the
>>>> cluster
>>>>> tries to start, the start fails, then tries to stop, and the stop
>>>> also
>>>>> fails also.
>>>>>
>>>>> One more thing, my cluster was working fine on Fedora 28, i started
>>>>> having this problem after upgrade to Fedora 31.
>>>>>
>>>>> On 17/02/2020 21:30, Ricardo Esteves wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Yes, i also don't understand why is trying to stop them first.
>>>>>>
>>>>>> SELinux is disabled:
>>>>>>
>>>>>> # getenforce
>>>>>> Disabled
>>>>>>
>>>>>> All systemd services controlled by the cluster are disabled from
>>>>>> starting at boot:
>>>>>>
>>>>>> # systemctl is-enabled httpd
>>>>>> disabled
>>>>>>
>>>>>> # systemctl is-enabled openvpn-server at 01-server
>>>>>> disabled
>>>>>>
>>>>>>
>>>>>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> When i start my cluster, most of my systemd resources won't
>> start:
>>>>>>>> Failed Resource Actions:
>>>>>>>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>>>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>>>>>>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>>>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>>>>>> These show that attempts to stop failed, rather than start.
>>>>>>>
>>>>>>>> So everytime i reboot my node, i need to start the resources
>>>> manually
>>>>>>>> using systemd, for example:
>>>>>>>>
>>>>>>>> systemd start apache
>>>>>>>>
>>>>>>>> and then pcs resource cleanup
>>>>>>>>
>>>>>>>> Resources configuration:
>>>>>>>>
>>>>>>>> Clone: apache-clone
>>>>>>>>   Meta Attrs: maintenance=false
>>>>>>>>   Resource: apache (class=systemd type=httpd)
>>>>>>>>    Meta Attrs: maintenance=false
>>>>>>>>    Operations: monitor interval=60 timeout=100 (apache-monitor-
>>>>>>>> interval-60)
>>>>>>>>                start interval=0s timeout=100
>>>> (apache-start-interval-
>>>>>>>> 0s)
>>>>>>>>                stop interval=0s timeout=100
>>>> (apache-stop-interval-0s)
>>>>>>>>
>>>>>>>> Resource: openvpn (class=systemd type=openvpn-server at 01-server)
>>>>>>>>    Meta Attrs: maintenance=false
>>>>>>>>    Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>>>>>>>> interval-60)
>>>>>>>>                start interval=0s timeout=100
>>>> (openvpn-start-interval-
>>>>>>>> 0s)
>>>>>>>>                stop interval=0s timeout=100
>>>> (openvpn-stop-interval-
>>>>>>>> 0s)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Btw, if i try a debug-start / debug-stop the mentioned resources
>>>>>>>> start and stop ok.
>>>>>>> Based on that, my first guess would be SELinux. Check the SELinux
>>>> logs
>>>>>>> for denials.
>>>>>>>
>>>>>>> Also, make sure your systemd services are not enabled in systemd
>>>> itself
>>>>>>> (e.g. via systemctl enable). Clustered systemd services should be
>>>>>>> managed by the cluster only.
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>> You really need to debug the start & stop of  tthe resource .
>>>
>>> Please try the debug procedure  and provide the output:
>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>>
>>> Best Regards,
>>> Strahil Nikolov
> Hi Maverick,
>
>
> you can replace 'crm resource stop' with 'pcs  resource disable'.
> The rest is working, but sadly not for systemd.
>
> You can try to:
> 'pcs resource debug-start <resource> --full'
> Another approach is to:
> 1. Copy service  to /etc/systemd/system
> 2. In '[service]' section add this:
> Environment=SYSTEMD_LOG_LEVEL=debug
> 3. Reload  systemd:
> systemctl daemon_reload
> Note: I assume you got downtime for debugging the issue
> 4. Use  'debug-start --full'
>
> Note: Don't forget to remove the debug, or your journal will get full.
>
> Best Regards,
> Strahil Nikolov