[ClusterLabs] Antw: [EXT] Re: Q: starting systemd resources
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Feb 5 06:19:46 EST 2021
Hi Reid!
Thanks for the explanation. Picking the right log levels is not trivial, but I'd expect them to be somewhat consistent:
If for non-systemd I see separate start and finished messages, I'd also expect them for systemd resources, or vice-versa.
Otherwise it will make reading the logs harder, specifically when several resources are starting concurrently.
Regards,
Ulrich
>>> Reid Wahl <nwahl at redhat.com> schrieb am 05.02.2021 um 11:08 in Nachricht
<CAPiuu99SzKMsrxG9dw4y5+ZOFcNG3EAHpJP=zFUbcTibVzuDLw at mail.gmail.com>:
> Hi, Ulrich. I presume you're talking about the log_finished line, which
> looks like this in pacemaker.log:
>
> Feb 05 01:48:47.192 fastvm-rhel-8-0-23 pacemaker-execd [15446]
> (log_finished at execd_commands.c:214) info: dummy start (call 23, PID 18743)
> exited with status 0 (execution time 11ms, queue time 0ms)
>
> Is that correct?
Yes. The detail is visible in pacemaker.log only.
>
> I got curious and took a look. It looks like this comes down to the
> action_complete() function:
> -
> https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/daemons/execd/
> execd_commands.c#L896-L1050
>
> This function calls cmd_finalize() at the end, which calls a chain of
> functions ending in log_finished(). log_finished() logs the "start ...
> exited with status" line shown above.
>
> However, systemd resources have to be handled a bit differently. We have to
> schedule a monitor operation to run after the start operation, in order to
> check whether the start was truly a success.
Doesn't systemctl return a proper exit status?
> ~~~
> if (pcmk__str_eq(rclass, PCMK_RESOURCE_CLASS_SYSTEMD, pcmk__str_casei))
> {
> if ((cmd->exec_rc == PCMK_OCF_OK)
> && pcmk__strcase_any_of(cmd->action, "start", "stop", NULL)) {
> /* systemd returns from start and stop actions after the action
> * begins, not after it completes. We have to jump through a few
> * hoops so that we don't report 'complete' to the rest of
> pacemaker
> * until it's actually done.
> */
> goagain = true;
> cmd->real_action = cmd->action;
> cmd->action = strdup("monitor");
> ...
> if (goagain) {
> ...
> schedule_lrmd_cmd(rsc, cmd);
>
> /* Don't finalize cmd, we're not done with it yet */
> return;
> ~~~
>
> So for the start operation, it never reaches the cmd_finalize() call at the
> end, until the follow-up monitor runs. The follow-up monitor operation does
> end up calling cmd_finalize() at the end of action_complete. But the
> "log_finished" message is logged at debug level for monitor operations. So
> you won't see it unless debugging is enabled.
>
> Does this make sense?
Yes.
>
> Example:
> ~~~
> Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (log_execute) info: executing - rsc:nfs-daemon action:start
> call_id:20
> Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (systemd_unit_exec) debug: Performing asynchronous start op on systemd
> unit nfs-server.service named 'nfs-daemon'
> Feb 05 02:06:51.124 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (systemd_unit_exec_with_unit) debug: Calling StartUnit for
> nfs-daemon: /org/freedesktop/systemd1/unit/nfs_2dserver_2eservice
> Feb 05 02:06:51.517 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (action_complete) debug: nfs-daemon start may still be in progress:
> re-scheduling (elapsed=394ms, remaining=99606ms, start_delay=2000ms)
> Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (log_execute) debug: executing - rsc:nfs-daemon action:monitor
> call_id:20
> Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (systemd_unit_exec) debug: Performing asynchronous status op on systemd
> unit nfs-server.service named 'nfs-daemon'
> Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (action_complete) debug: nfs-daemon systemd start is now complete
> (elapsed=2397ms, remaining=97603ms): ok (0)
> Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354]
> (log_finished) debug: nfs-daemon monitor (call 20) exited with status
> 0 (execution time 2397ms, queue time 0ms)
> ~~~
>
>
>
> On Thu, Feb 4, 2021 at 11:25 PM Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
>> Hi!
>>
>> While analyzing cluster problems I noticed this:
>> Normal resources executed via OCF RAs create two log entries by
>> pacemaker-execd: One when starting the resource and another when the
>> resource completed starting.
>> However for systemd units I only get a start message. Is that intentional?
>> Does that mean systemd starts are asynchronous in general (i.e.: The
>> process returns before start in complete)?
>> (Still I get a completed message from pacemaker-controld)
>>
>> Example:
>> Feb 04 15:41:25 h19 pacemaker-execd[7793]: notice: executing -
>> rsc:prm_virtlockd action:start call_id:95
>> Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing -
>> rsc:prm_libvirtd action:start call_id:97
>>
>> So one could guess that lirtlockd and libvirtd are staring concurrently,
>> but the did not because of this sequence:
>> Feb 04 15:41:27 h19 pacemaker-controld[7796]: notice: Result of start
>> operation for prm_virtlockd on h19: ok
>> Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing -
>> rsc:prm_libvirtd action:start call_id:97
>>
>> Regards,
>> Ulrich
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
>
> --
> Regards,
>
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA
More information about the Users
mailing list