[ClusterLabs] Q: starting systemd resources
Reid Wahl
nwahl at redhat.com
Fri Feb 5 05:08:38 EST 2021
Hi, Ulrich. I presume you're talking about the log_finished line, which
looks like this in pacemaker.log:
Feb 05 01:48:47.192 fastvm-rhel-8-0-23 pacemaker-execd [15446]
(log_finished at execd_commands.c:214) info: dummy start (call 23, PID 18743)
exited with status 0 (execution time 11ms, queue time 0ms)
Is that correct?
I got curious and took a look. It looks like this comes down to the
action_complete() function:
-
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/daemons/execd/execd_commands.c#L896-L1050
This function calls cmd_finalize() at the end, which calls a chain of
functions ending in log_finished(). log_finished() logs the "start ...
exited with status" line shown above.
However, systemd resources have to be handled a bit differently. We have to
schedule a monitor operation to run after the start operation, in order to
check whether the start was truly a success.
~~~
if (pcmk__str_eq(rclass, PCMK_RESOURCE_CLASS_SYSTEMD, pcmk__str_casei))
{
if ((cmd->exec_rc == PCMK_OCF_OK)
&& pcmk__strcase_any_of(cmd->action, "start", "stop", NULL)) {
/* systemd returns from start and stop actions after the action
* begins, not after it completes. We have to jump through a few
* hoops so that we don't report 'complete' to the rest of
pacemaker
* until it's actually done.
*/
goagain = true;
cmd->real_action = cmd->action;
cmd->action = strdup("monitor");
...
if (goagain) {
...
schedule_lrmd_cmd(rsc, cmd);
/* Don't finalize cmd, we're not done with it yet */
return;
~~~
So for the start operation, it never reaches the cmd_finalize() call at the
end, until the follow-up monitor runs. The follow-up monitor operation does
end up calling cmd_finalize() at the end of action_complete. But the
"log_finished" message is logged at debug level for monitor operations. So
you won't see it unless debugging is enabled.
Does this make sense?
Example:
~~~
Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(log_execute) info: executing - rsc:nfs-daemon action:start
call_id:20
Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(systemd_unit_exec) debug: Performing asynchronous start op on systemd
unit nfs-server.service named 'nfs-daemon'
Feb 05 02:06:51.124 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(systemd_unit_exec_with_unit) debug: Calling StartUnit for
nfs-daemon: /org/freedesktop/systemd1/unit/nfs_2dserver_2eservice
Feb 05 02:06:51.517 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(action_complete) debug: nfs-daemon start may still be in progress:
re-scheduling (elapsed=394ms, remaining=99606ms, start_delay=2000ms)
Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(log_execute) debug: executing - rsc:nfs-daemon action:monitor
call_id:20
Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(systemd_unit_exec) debug: Performing asynchronous status op on systemd
unit nfs-server.service named 'nfs-daemon'
Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(action_complete) debug: nfs-daemon systemd start is now complete
(elapsed=2397ms, remaining=97603ms): ok (0)
Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354]
(log_finished) debug: nfs-daemon monitor (call 20) exited with status
0 (execution time 2397ms, queue time 0ms)
~~~
On Thu, Feb 4, 2021 at 11:25 PM Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:
> Hi!
>
> While analyzing cluster problems I noticed this:
> Normal resources executed via OCF RAs create two log entries by
> pacemaker-execd: One when starting the resource and another when the
> resource completed starting.
> However for systemd units I only get a start message. Is that intentional?
> Does that mean systemd starts are asynchronous in general (i.e.: The
> process returns before start in complete)?
> (Still I get a completed message from pacemaker-controld)
>
> Example:
> Feb 04 15:41:25 h19 pacemaker-execd[7793]: notice: executing -
> rsc:prm_virtlockd action:start call_id:95
> Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing -
> rsc:prm_libvirtd action:start call_id:97
>
> So one could guess that lirtlockd and libvirtd are staring concurrently,
> but the did not because of this sequence:
> Feb 04 15:41:27 h19 pacemaker-controld[7796]: notice: Result of start
> operation for prm_virtlockd on h19: ok
> Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing -
> rsc:prm_libvirtd action:start call_id:97
>
> Regards,
> Ulrich
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
--
Regards,
Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210205/405e7f32/attachment.htm>
More information about the Users
mailing list