[ClusterLabs] Q: starting systemd resources

Fri Feb 5 05:08:38 EST 2021

Hi, Ulrich. I presume you're talking about the log_finished line, which
looks like this in pacemaker.log:

Feb 05 01:48:47.192 fastvm-rhel-8-0-23 pacemaker-execd     [15446]
(log_finished at execd_commands.c:214)  info: dummy start (call 23, PID 18743)
exited with status 0 (execution time 11ms, queue time 0ms)

Is that correct?

I got curious and took a look. It looks like this comes down to the
action_complete() function:
  -
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/daemons/execd/execd_commands.c#L896-L1050

This function calls cmd_finalize() at the end, which calls a chain of
functions ending in log_finished(). log_finished() logs the "start ...
exited with status" line shown above.

However, systemd resources have to be handled a bit differently. We have to
schedule a monitor operation to run after the start operation, in order to
check whether the start was truly a success.
~~~
    if (pcmk__str_eq(rclass, PCMK_RESOURCE_CLASS_SYSTEMD, pcmk__str_casei))
{
        if ((cmd->exec_rc == PCMK_OCF_OK)
            && pcmk__strcase_any_of(cmd->action, "start", "stop", NULL)) {
            /* systemd returns from start and stop actions after the action
             * begins, not after it completes. We have to jump through a few
             * hoops so that we don't report 'complete' to the rest of
pacemaker
             * until it's actually done.
             */
            goagain = true;
            cmd->real_action = cmd->action;
            cmd->action = strdup("monitor");
...
    if (goagain) {
    ...
            schedule_lrmd_cmd(rsc, cmd);

            /* Don't finalize cmd, we're not done with it yet */
            return;
~~~

So for the start operation, it never reaches the cmd_finalize() call at the
end, until the follow-up monitor runs. The follow-up monitor operation does
end up calling cmd_finalize() at the end of action_complete. But the
"log_finished" message is logged at debug level for monitor operations. So
you won't see it unless debugging is enabled.

Does this make sense?

Example:
~~~
Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(log_execute)        info: executing - rsc:nfs-daemon action:start
call_id:20
Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(systemd_unit_exec)  debug: Performing asynchronous start op on systemd
unit nfs-server.service named 'nfs-daemon'
Feb 05 02:06:51.124 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(systemd_unit_exec_with_unit)        debug: Calling StartUnit for
nfs-daemon: /org/freedesktop/systemd1/unit/nfs_2dserver_2eservice
Feb 05 02:06:51.517 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(action_complete)    debug: nfs-daemon start may still be in progress:
re-scheduling (elapsed=394ms, remaining=99606ms, start_delay=2000ms)
Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(log_execute)        debug: executing - rsc:nfs-daemon action:monitor
call_id:20
Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(systemd_unit_exec)  debug: Performing asynchronous status op on systemd
unit nfs-server.service named 'nfs-daemon'
Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(action_complete)    debug: nfs-daemon systemd start is now complete
(elapsed=2397ms, remaining=97603ms): ok (0)
Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd     [19354]
(log_finished)       debug: nfs-daemon monitor (call 20) exited with status
0 (execution time 2397ms, queue time 0ms)
~~~

On Thu, Feb 4, 2021 at 11:25 PM Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:

> Hi!
>
> While analyzing cluster problems I noticed this:
> Normal resources executed via OCF RAs create two log entries by
> pacemaker-execd: One when starting the resource and another when the
> resource completed starting.
> However for systemd units I only get a start message. Is that intentional?
> Does that mean systemd starts are asynchronous in general (i.e.: The
> process returns before start in complete)?
> (Still I get a completed message from pacemaker-controld)
>
> Example:
> Feb 04 15:41:25 h19 pacemaker-execd[7793]:  notice: executing -
> rsc:prm_virtlockd action:start call_id:95
> Feb 04 15:41:27 h19 pacemaker-execd[7793]:  notice: executing -
> rsc:prm_libvirtd action:start call_id:97
>
> So one could guess that lirtlockd and libvirtd are staring concurrently,
> but the did not because of this sequence:
> Feb 04 15:41:27 h19 pacemaker-controld[7796]:  notice: Result of start
> operation for prm_virtlockd on h19: ok
> Feb 04 15:41:27 h19 pacemaker-execd[7793]:  notice: executing -
> rsc:prm_libvirtd action:start call_id:97
>
> Regards,
> Ulrich
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210205/405e7f32/attachment.htm>