[ClusterLabs] Resource not starting correctly IV

Tue Apr 16 16:09:28 EDT 2019

Thanks. In the end, I found out that my target application has a setting
whereby the application becomes instantly detectable to the monitoring side
of my script. After doing this, the associated resource is created
flawlessly every time.

On Tue, Apr 16, 2019 at 1:46 PM Jan Pokorný <jpokorny at redhat.com> wrote:

> [letter-casing wise:
>  it's either "Pacemaker" or down-to-the-terminal "pacemaker"]
>
> On 16/04/19 10:21 -0600, JCA wrote:
> > 2. It would seem that what Pacemaker is doing is the following:
> >    a. Check out whether the app is running.
> >    b. If it is not, launch it.
> >    c. Check out again
> >    d. If running, exit.
> >    e. Otherwise, stop it.
> >     f. Launch it.
> >    g. Go to a.
> >
> > [...]
> >
> > 4. If the above is correct, and if I am getting the picture correctly, it
> > would seem that the problem is that my monitoring function does not
> detect
> > immediately that my app is up and running. That's clearly my problem.
> > However, is there any way to get Pacemaker to introduce a delay between
> > steps b and c in section 2 above?
>
> Ah, it should have occurred to me!
>
> Typical solution, I think, is to have a sleep loop following the
> daemon launch within "start" action that will run (subset) of what
> "monitor" normally does, so as to synchronize on the "service ready"
> moment.  Default timeout for "start" within agent's metadata should
> then reflect the common time to get to the point "monitor" is happy
> plus some reserve.
>
> Some agents may do more elaborate things like precisely limiting such
> waiting in respect to the time they were actually given by the
> resource manager/pacemaker (if I don't misremember, that value is
> provided through environment variables for sort of an introspection).
>
> Resource agent experts could advise here.
>
> (Truth to be told, "daemon readiness" used to be a very marginalized
> problem putting barriers to practical [= race-free] dependency ordering
> etc., luckily clever people realized that the most precize tracking
> can only be at the hands of the actual daemon implementors if event
> driven paradigm is to be applied.  For instance, if you can influence
> my_app, and it's a standard forking daemon, it would be best if the
> parent exited only when the daemon is truly ready to provide service
> -- this usually requires some typically signal-based synchronization
> amongst the daemon processes.  With systemd, situation is much simpler
> since no forking is necessary, just a call to sd_notify(3) -- in that
> case, though, your agent would need to mimic the server side of the
> sd_notify protocol since nothing would do it for you.)
>
> > 5. Following up on 4: if my script sleeps for a few seconds immediately
> > after launching my app (it's a daemon) in myapp_start then everything
> works
> > fine. Indeed, the call sequence in node one now becomes:
> >
> >          monitor:
> >
> >     Status: NOT_RUNNING
> >     Exit: NOT_RUNNING
> >
> >           start:
> >
> >     Validate: SUCCESS
> >     Status: NOT_RUNNING
> >     Start: SUCCESS
> >     Exit: SUCCESS
> >
> >           monitor:
> >
> >     Status: SUCCESS
> >     Exit: SUCCESS
>
> That's easier but less effective and reliable (more opportunistic than
> fact-based) than polling the "monitor" outcomes privately within "start"
> as sketched above.
>
> --
> Jan (Poki)
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190416/5855522c/attachment.html>