[ClusterLabs] Fedora 31 - systemd based resources don't start

Strahil Nikolov hunter86_bg at yahoo.com
Thu Feb 20 10:46:37 EST 2020


On February 20, 2020 12:49:43 PM GMT+02:00, Maverick <mvrk at sapo.pt> wrote:
>
>> You really need to debug the start & stop of  tthe resource .
>>
>> Please try the debug procedure  and provide the output:
>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>
>> Best Regards,
>> Strahil Nikolov
>
>
>Hi,
>
>Correct me if i'm wrong, but i think that procedure doesn't work for
>systemd class resources, i don't know which OCF script is responsible
>for handling systemd class resources.
>
>Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
>command
>only in SUSE.
>
>
>
>On 19/02/2020 19:23, Strahil Nikolov wrote:
>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick <mvrk at sapo.pt>
>wrote:
>>> How is it possible that pacemaker is reporting that takes 4.2
>minutes
>>> (254930ms) to execute the start of httpd systemd unit?
>>>
>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (log_execute)    
>>> info:
>>> executing - rsc:apache action:start call_id:25
>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (systemd_unit_exec)
>>>    
>>> debug: Performing asynchronous start op on systemd unit httpd named
>>> 'apache'
>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
>>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
>apache:
>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (action_complete)
>   
>>> notice: Giving up on apache start (rc=0): timeout (elapsed=254930ms,
>>> remaining=-154930ms)
>>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (log_finished)    
>>> debug: finished - rsc:apache action:monitor call_id:25 
>exit-code:198
>>> exec-time:254935ms queue-time:235ms
>>>
>>>
>>> Starting manually works fine and fast:
>>>
>>> # time systemctl start httpd
>>> real    0m0.144s
>>> user    0m0.005s
>>> sys    0m0.008s
>>>
>>>
>>> On 17/02/2020 22:47, Mvrk wrote:
>>>> In attachment the pacemaker.log. On the log i can see that the
>>> cluster
>>>> tries to start, the start fails, then tries to stop, and the stop
>>> also
>>>> fails also.
>>>>
>>>> One more thing, my cluster was working fine on Fedora 28, i started
>>>> having this problem after upgrade to Fedora 31.
>>>>
>>>> On 17/02/2020 21:30, Ricardo Esteves wrote:
>>>>> Hi,
>>>>>
>>>>> Yes, i also don't understand why is trying to stop them first.
>>>>>
>>>>> SELinux is disabled:
>>>>>
>>>>> # getenforce
>>>>> Disabled
>>>>>
>>>>> All systemd services controlled by the cluster are disabled from
>>>>> starting at boot:
>>>>>
>>>>> # systemctl is-enabled httpd
>>>>> disabled
>>>>>
>>>>> # systemctl is-enabled openvpn-server at 01-server
>>>>> disabled
>>>>>
>>>>>
>>>>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> When i start my cluster, most of my systemd resources won't
>start:
>>>>>>>
>>>>>>> Failed Resource Actions:
>>>>>>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>>>>>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>>>>> These show that attempts to stop failed, rather than start.
>>>>>>
>>>>>>> So everytime i reboot my node, i need to start the resources
>>> manually
>>>>>>> using systemd, for example:
>>>>>>>
>>>>>>> systemd start apache
>>>>>>>
>>>>>>> and then pcs resource cleanup
>>>>>>>
>>>>>>> Resources configuration:
>>>>>>>
>>>>>>> Clone: apache-clone
>>>>>>>   Meta Attrs: maintenance=false
>>>>>>>   Resource: apache (class=systemd type=httpd)
>>>>>>>    Meta Attrs: maintenance=false
>>>>>>>    Operations: monitor interval=60 timeout=100 (apache-monitor-
>>>>>>> interval-60)
>>>>>>>                start interval=0s timeout=100
>>> (apache-start-interval-
>>>>>>> 0s)
>>>>>>>                stop interval=0s timeout=100
>>> (apache-stop-interval-0s)
>>>>>>>
>>>>>>>
>>>>>>> Resource: openvpn (class=systemd type=openvpn-server at 01-server)
>>>>>>>    Meta Attrs: maintenance=false
>>>>>>>    Operations: monitor interval=60 timeout=100 (openvpn-monitor-
>>>>>>> interval-60)
>>>>>>>                start interval=0s timeout=100
>>> (openvpn-start-interval-
>>>>>>> 0s)
>>>>>>>                stop interval=0s timeout=100
>>> (openvpn-stop-interval-
>>>>>>> 0s)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Btw, if i try a debug-start / debug-stop the mentioned resources
>>>>>>> start and stop ok.
>>>>>> Based on that, my first guess would be SELinux. Check the SELinux
>>> logs
>>>>>> for denials.
>>>>>>
>>>>>> Also, make sure your systemd services are not enabled in systemd
>>> itself
>>>>>> (e.g. via systemctl enable). Clustered systemd services should be
>>>>>> managed by the cluster only.
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>> You really need to debug the start & stop of  tthe resource .
>>
>> Please try the debug procedure  and provide the output:
>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>
>> Best Regards,
>> Strahil Nikolov

Hi Maverick,


you can replace 'crm resource stop' with 'pcs  resource disable'.
The rest is working, but sadly not for systemd.

You can try to:
'pcs resource debug-start <resource> --full'
Another approach is to:
1. Copy service  to /etc/systemd/system
2. In '[service]' section add this:
Environment=SYSTEMD_LOG_LEVEL=debug
3. Reload  systemd:
systemctl daemon_reload
Note: I assume you got downtime for debugging the issue
4. Use  'debug-start --full'

Note: Don't forget to remove the debug, or your journal will get full.

Best Regards,
Strahil Nikolov


More information about the Users mailing list