[ClusterLabs] Antw: [EXT] Re: Q: starting systemd resources

Fri Feb 5 06:19:46 EST 2021

Hi Reid!

Thanks for the explanation. Picking the right log levels is not trivial, but I'd expect them to be somewhat consistent:
If for non-systemd I see separate start and finished messages, I'd also expect them for systemd resources, or vice-versa.
Otherwise it will make reading the logs harder, specifically when several resources are starting concurrently.

Regards,
Ulrich

>>> Reid Wahl <nwahl at redhat.com> schrieb am 05.02.2021 um 11:08 in Nachricht
<CAPiuu99SzKMsrxG9dw4y5+ZOFcNG3EAHpJP=zFUbcTibVzuDLw at mail.gmail.com>:
> Hi, Ulrich. I presume you're talking about the log_finished line, which
> looks like this in pacemaker.log:
> 
> Feb 05 01:48:47.192 fastvm-rhel-8-0-23 pacemaker-execd     [15446]
> (log_finished at execd_commands.c:214)  info: dummy start (call 23, PID 18743)
> exited with status 0 (execution time 11ms, queue time 0ms)
> 
> Is that correct?

Yes. The detail is visible in pacemaker.log only.

> 
> I got curious and took a look. It looks like this comes down to the
> action_complete() function:
>   -
> https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/daemons/execd/ 
> execd_commands.c#L896-L1050
> 
> This function calls cmd_finalize() at the end, which calls a chain of
> functions ending in log_finished(). log_finished() logs the "start ...
> exited with status" line shown above.
> 
> However, systemd resources have to be handled a bit differently. We have to
> schedule a monitor operation to run after the start operation, in order to
> check whether the start was truly a success.

Doesn't systemctl return a proper exit status?

> ~~~
>     if (pcmk__str_eq(rclass, PCMK_RESOURCE_CLASS_SYSTEMD, pcmk__str_casei))
> {
>         if ((cmd->exec_rc == PCMK_OCF_OK)
>             && pcmk__strcase_any_of(cmd->action, "start", "stop", NULL)) {
>             /* systemd returns from start and stop actions after the action
>              * begins, not after it completes. We have to jump through a few
>              * hoops so that we don't report 'complete' to the rest of
> pacemaker
>              * until it's actually done.
>              */
>             goagain = true;
>             cmd->real_action = cmd->action;
>             cmd->action = strdup("monitor");
> ...
>     if (goagain) {
>     ...
>             schedule_lrmd_cmd(rsc, cmd);
> 
>             /* Don't finalize cmd, we're not done with it yet */
>             return;
> ~~~
> 
> So for the start operation, it never reaches the cmd_finalize() call at the
> end, until the follow-up monitor runs. The follow-up monitor operation does
> end up calling cmd_finalize() at the end of action_complete. But the
> "log_finished" message is logged at debug level for monitor operations. So
> you won't see it unless debugging is enabled.
> 
> Does this make sense?

Yes.

> 
> Example:
> ~~~
> Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (log_execute)        info: executing - rsc:nfs-daemon action:start
> call_id:20
> Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (systemd_unit_exec)  debug: Performing asynchronous start op on systemd
> unit nfs-server.service named 'nfs-daemon'
> Feb 05 02:06:51.124 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (systemd_unit_exec_with_unit)        debug: Calling StartUnit for
> nfs-daemon: /org/freedesktop/systemd1/unit/nfs_2dserver_2eservice
> Feb 05 02:06:51.517 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (action_complete)    debug: nfs-daemon start may still be in progress:
> re-scheduling (elapsed=394ms, remaining=99606ms, start_delay=2000ms)
> Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (log_execute)        debug: executing - rsc:nfs-daemon action:monitor
> call_id:20
> Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (systemd_unit_exec)  debug: Performing asynchronous status op on systemd
> unit nfs-server.service named 'nfs-daemon'
> Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (action_complete)    debug: nfs-daemon systemd start is now complete
> (elapsed=2397ms, remaining=97603ms): ok (0)
> Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
> (log_finished)       debug: nfs-daemon monitor (call 20) exited with status
> 0 (execution time 2397ms, queue time 0ms)
> ~~~
> 
> 
> 
> On Thu, Feb 4, 2021 at 11:25 PM Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
>> Hi!
>>
>> While analyzing cluster problems I noticed this:
>> Normal resources executed via OCF RAs create two log entries by
>> pacemaker-execd: One when starting the resource and another when the
>> resource completed starting.
>> However for systemd units I only get a start message. Is that intentional?
>> Does that mean systemd starts are asynchronous in general (i.e.: The
>> process returns before start in complete)?
>> (Still I get a completed message from pacemaker-controld)
>>
>> Example:
>> Feb 04 15:41:25 h19 pacemaker-execd[7793]:  notice: executing -
>> rsc:prm_virtlockd action:start call_id:95
>> Feb 04 15:41:27 h19 pacemaker-execd[7793]:  notice: executing -
>> rsc:prm_libvirtd action:start call_id:97
>>
>> So one could guess that lirtlockd and libvirtd are staring concurrently,
>> but the did not because of this sequence:
>> Feb 04 15:41:27 h19 pacemaker-controld[7796]:  notice: Result of start
>> operation for prm_virtlockd on h19: ok
>> Feb 04 15:41:27 h19 pacemaker-execd[7793]:  notice: executing -
>> rsc:prm_libvirtd action:start call_id:97
>>
>> Regards,
>> Ulrich
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>
>>
> 
> -- 
> Regards,
> 
> Reid Wahl, RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA