[ClusterLabs] Fedora 31 - systemd based resources don't start

Strahil Nikolov hunter86_bg at yahoo.com
Thu Feb 20 15:02:15 EST 2020


On February 20, 2020 9:35:07 PM GMT+02:00, Maverick <mvrk at sapo.pt> wrote:
>
>Manually it starts ok, no problems:
>
>pcs resource debug-start apache --full
>(unpack_config)     warning: Blind faith: not fencing unseen nodes
>Operation start for apache (systemd::httpd) returned: 'ok' (0)
>
>
>On 20/02/2020 16:46, Strahil Nikolov wrote:
>> On February 20, 2020 12:49:43 PM GMT+02:00, Maverick <mvrk at sapo.pt>
>wrote:
>>>> You really need to debug the start & stop of  tthe resource .
>>>>
>>>> Please try the debug procedure  and provide the output:
>>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>>>
>>> Hi,
>>>
>>> Correct me if i'm wrong, but i think that procedure doesn't work for
>>> systemd class resources, i don't know which OCF script is
>responsible
>>> for handling systemd class resources.
>>>
>>> Also crm command doesn't exist in RHEL/Fedora, i've seen the crm
>>> command
>>> only in SUSE.
>>>
>>>
>>>
>>> On 19/02/2020 19:23, Strahil Nikolov wrote:
>>>> On February 19, 2020 7:21:12 PM GMT+02:00, Maverick <mvrk at sapo.pt>
>>> wrote:
>>>>> How is it possible that pacemaker is reporting that takes 4.2
>>> minutes
>>>>> (254930ms) to execute the start of httpd systemd unit?
>>>>>
>>>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514] (log_execute)    
>>>>> info:
>>>>> executing - rsc:apache action:start call_id:25
>>>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
>(systemd_unit_exec)
>>>>>    
>>>>> debug: Performing asynchronous start op on systemd unit httpd
>named
>>>>> 'apache'
>>>>> Feb 19 17:04:09 boss1 pacemaker-execd     [1514]
>>>>> (systemd_unit_exec_with_unit)     debug: Calling StartUnit for
>>> apache:
>>>>> /org/freedesktop/systemd1/unit/httpd_2eservice
>>>>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (action_complete)
>>>    
>>>>> notice: Giving up on apache start (rc=0): timeout
>(elapsed=254930ms,
>>>>> remaining=-154930ms)
>>>>> Feb 19 17:04:10 boss1 pacemaker-execd     [1514] (log_finished)
>   
>>>>> debug: finished - rsc:apache action:monitor call_id:25 
>>> exit-code:198
>>>>> exec-time:254935ms queue-time:235ms
>>>>>
>>>>>
>>>>> Starting manually works fine and fast:
>>>>>
>>>>> # time systemctl start httpd
>>>>> real    0m0.144s
>>>>> user    0m0.005s
>>>>> sys    0m0.008s
>>>>>
>>>>>
>>>>> On 17/02/2020 22:47, Mvrk wrote:
>>>>>> In attachment the pacemaker.log. On the log i can see that the
>>>>> cluster
>>>>>> tries to start, the start fails, then tries to stop, and the stop
>>>>> also
>>>>>> fails also.
>>>>>>
>>>>>> One more thing, my cluster was working fine on Fedora 28, i
>started
>>>>>> having this problem after upgrade to Fedora 31.
>>>>>>
>>>>>> On 17/02/2020 21:30, Ricardo Esteves wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Yes, i also don't understand why is trying to stop them first.
>>>>>>>
>>>>>>> SELinux is disabled:
>>>>>>>
>>>>>>> # getenforce
>>>>>>> Disabled
>>>>>>>
>>>>>>> All systemd services controlled by the cluster are disabled from
>>>>>>> starting at boot:
>>>>>>>
>>>>>>> # systemctl is-enabled httpd
>>>>>>> disabled
>>>>>>>
>>>>>>> # systemctl is-enabled openvpn-server at 01-server
>>>>>>> disabled
>>>>>>>
>>>>>>>
>>>>>>> On 17/02/2020 20:28, Ken Gaillot wrote:
>>>>>>>> On Mon, 2020-02-17 at 17:35 +0000, Maverick wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> When i start my cluster, most of my systemd resources won't
>>> start:
>>>>>>>>> Failed Resource Actions:
>>>>>>>>>   * apache_stop_0 on boss1 'OCF_TIMEOUT' (198): call=82,
>>>>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>>>>> 01:00:54 +01:00', queued=29ms, exec=197799ms
>>>>>>>>>   * openvpn_stop_0 on boss1 'OCF_TIMEOUT' (198): call=61,
>>>>>>>>> status='Timed Out', exitreason='', last-rc-change='1970-01-01
>>>>>>>>> 01:00:54 +01:00', queued=1805ms, exec=198841ms
>>>>>>>> These show that attempts to stop failed, rather than start.
>>>>>>>>
>>>>>>>>> So everytime i reboot my node, i need to start the resources
>>>>> manually
>>>>>>>>> using systemd, for example:
>>>>>>>>>
>>>>>>>>> systemd start apache
>>>>>>>>>
>>>>>>>>> and then pcs resource cleanup
>>>>>>>>>
>>>>>>>>> Resources configuration:
>>>>>>>>>
>>>>>>>>> Clone: apache-clone
>>>>>>>>>   Meta Attrs: maintenance=false
>>>>>>>>>   Resource: apache (class=systemd type=httpd)
>>>>>>>>>    Meta Attrs: maintenance=false
>>>>>>>>>    Operations: monitor interval=60 timeout=100
>(apache-monitor-
>>>>>>>>> interval-60)
>>>>>>>>>                start interval=0s timeout=100
>>>>> (apache-start-interval-
>>>>>>>>> 0s)
>>>>>>>>>                stop interval=0s timeout=100
>>>>> (apache-stop-interval-0s)
>>>>>>>>>
>>>>>>>>> Resource: openvpn (class=systemd
>type=openvpn-server at 01-server)
>>>>>>>>>    Meta Attrs: maintenance=false
>>>>>>>>>    Operations: monitor interval=60 timeout=100
>(openvpn-monitor-
>>>>>>>>> interval-60)
>>>>>>>>>                start interval=0s timeout=100
>>>>> (openvpn-start-interval-
>>>>>>>>> 0s)
>>>>>>>>>                stop interval=0s timeout=100
>>>>> (openvpn-stop-interval-
>>>>>>>>> 0s)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Btw, if i try a debug-start / debug-stop the mentioned
>resources
>>>>>>>>> start and stop ok.
>>>>>>>> Based on that, my first guess would be SELinux. Check the
>SELinux
>>>>> logs
>>>>>>>> for denials.
>>>>>>>>
>>>>>>>> Also, make sure your systemd services are not enabled in
>systemd
>>>>> itself
>>>>>>>> (e.g. via systemctl enable). Clustered systemd services should
>be
>>>>>>>> managed by the cluster only.
>>>>> _______________________________________________
>>>>> Manage your subscription:
>>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>> You really need to debug the start & stop of  tthe resource .
>>>>
>>>> Please try the debug procedure  and provide the output:
>>>> https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
>>>>
>>>> Best Regards,
>>>> Strahil Nikolov
>> Hi Maverick,
>>
>>
>> you can replace 'crm resource stop' with 'pcs  resource disable'.
>> The rest is working, but sadly not for systemd.
>>
>> You can try to:
>> 'pcs resource debug-start <resource> --full'
>> Another approach is to:
>> 1. Copy service  to /etc/systemd/system
>> 2. In '[service]' section add this:
>> Environment=SYSTEMD_LOG_LEVEL=debug
>> 3. Reload  systemd:
>> systemctl daemon_reload
>> Note: I assume you got downtime for debugging the issue
>> 4. Use  'debug-start --full'
>>
>> Note: Don't forget to remove the debug, or your journal will get
>full.
>>
>> Best Regards,
>> Strahil Nikolov

Hi Maverick,


According this thread:
https://lists.clusterlabs.org/pipermail/users/2016-December/021053.html

You have 'startup-fencing' is set  to false.

Check it out - maybe this is your reason.

Best Regards,
Strahil Nikolov


More information about the Users mailing list