[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Sat Dec 2 15:00:25 EST 2017

On 07/11/17 02:01 +0100, Jan Pokorný wrote:
> On 07/11/17 01:02 +0300, Andrei Borzenkov wrote:
>> 06.11.2017 22:38, Valentin Vidic пишет:
>>> On Fri, Oct 13, 2017 at 02:07:33PM +0100, Adam Spiers wrote:
>>>> I think it depends on exactly what you mean by "synchronous" here. You can
>>>> start up a daemon, or a process which is responsible for forking into a
>>>> daemon, but how can you know for sure that a service is really up and
>>>> running?  Even if the daemon ran for a few seconds, it might die soon after.
>>>> At what point do you draw the line and say "OK start-up is now over, any
>>>> failures after this are failures of a running service"?  In that light,
>>>> "systemctl start" could return at a number of points in the startup process,
>>>> but there's probably always an element of asynchronicity in there.
>>>> Interested to hear other opinions on this.
>>> 
>>> systemd.service(5) describes a started (running) service depending
>>> on the service type:
>>> 
>>> simple  - systemd will immediately proceed starting follow-up units (after exec)
>>> forking - systemd will proceed with starting follow-up units as soon as
>>>           the parent process exits
>>> oneshot - process has to exit before systemd starts follow-up units
>>> dbus    - systemd will proceed with starting follow-up units after the
>>>           D-Bus bus name has been acquired
>>> notify  - systemd will proceed with starting follow-up units after this
>>>           notification message has been sent
>>> 
>>> Obviously notify is best here
>> 
>> forking, dbus and notify all allow daemon to signal to systemd that
>> deamon is ready to service request. Unfortunately ...
>> 
>>> but not all daemons implement sending
>>> sd_notify(READY=1) when they are ready to serve clients.
>>> 
>> 
>> ... as well as not all daemons properly daemonize itself or register on
>> D-Bus only after they are ready.
> 
> Sharing the sentiment about the situation, arising probably primarily
> from daemon authors never been pushed to indicate full ability to
> provide service precisely because 1/ it's not the primary objective of
> init systems -- the only thing they would need to comply with
> regarding getting these daemons started (as opposed to real
> service-oriented supervisors, which is also the realm of HA, right?),
> and 2/ even if it had been desirable to indicate that, no formalized
> interface (and in turn, system convolutions) that would become
> widespread was devised for that purpose.  On the other hand, sd_notify
> seems to reconcile that in my eyes (+1 to Valetin's qualifying it
> the best of the above options) as it doesn't impose any other effect
> (casting extra interpretation on, say, a fork event makes it
> possibly not intended or at least not-well-timed side-effect of the
> main, intended effect).

I had some information deficits that only now are becoming catered.
Specifically, I discovered this nice, elaborate study on the
"Readiness protocol problems with Unix dæmons":
https://jdebp.eu/FGA/unix-daemon-readiness-protocol-problems.html

Quoting it:
  Of course, only the service program itself can determine exactly
  when this point [of being ready, that, "is about to enter its main
  request processing loop"] is.

There's no way around this.

The whole objective of OCF standard looks retrospectively pretty
sidetracked through this lense: instead of pulling weight of the
semiformal standardization body (comprising significant industry
players) to raise awareness of this solvable reliability
discrepancy, possibly contributing to generally acknowledged,
resource manager agnostic solution (that could be inherited the
next generation of the init systems), it just put a little bit of
systemic approach to configuration management and monitoring on
top of the legacy of organically grown "good enough" initscripts,
clearly (because of inherent raciness and whatnot) not very suitable
for the act of supervision nor for any sort of reactive balancing
to satisfy the requirements (crucial in HA, polling interval-based
approach leads to losing trailing nines needlessly for cases you
can be notified about directly).

Basically, that page also provides an overview of the existing
"formalized intefaces" I had in mind above, in its "Several
incompatible protocols with low adoption" section, including
the mentioned sd_notify way of doing that in systemd realms
(and its criticism just as well).

Apparently, this is a recurring topic because to this day, the problem
hasn't been overcome in generic enough way, see NetBSD, as another
example:
https://mail-index.netbsd.org/tech-userlevel/2014/01/28/msg008401.html

This situation, caused by a lack of interest to get things right
in the past plus OS ecosystem segmentation playing against any
conceivable attempt to unify on a portable solution, is pretty
unsettling :-/

[*] see https://en.wikipedia.org/wiki/Open_Cluster_Framework

> To elaborate more, historically, it's customary to perform double fork
> in the daemons to make them as isolated from controlling terminals and
> what not as possible.  But it may not be desirable to perform anything
> security sensitive prior to at least the first fork, hence with
> "forking", you've already lost the preciseness of "ready" indication,
> unless there is some further synchronization between the parent and
> its child processes (I am yet to see that in practice).  So I'd say,
> unless the daemon is specifically fine-tuned, both forking and dbus
> types of services are bound to carry some amount of asynchronicity as
> mentioned.  To the distaste of said service supervisors that strive to
> maximize service usefulness over the considerable timeframe, which is
> way more than ticking the "should be running OK because it got started
> by me without any early failure" checkbox.
> 
> The main issue (though sometimes workable) of sd_notify approach is
> that in your composite application you may not have a direct "consider
> me ready" hook throughout the underlying stack, and tying it with
> processing of the first request is out of question because it's timing
> is not guaranteed (if it's ever to arrive).

Actually there were quite some other downsides mentioned in that
article.

> Sorry, didn't add much to the discussion, getting rid of
> asynchronities is tough in the world that wasn't widely intrested
> in poll/check-less "true ready" state.

And now I am adding, it is, in fact, mildly interested, but
per partes, within isolated islands, only.

-- 
Poki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/developers/attachments/20171202/b56443a3/attachment-0002.sig>