[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Thu Sep 22 09:49:11 EDT 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> On 09/21/2016 03:25 PM, Adam Spiers wrote:
> > Jan Pokorný <jpokorny at redhat.com> wrote:
> >> Just thinking aloud before the can is open.
> > 
> > Thanks for sharing - I'm very interested to hear your ideas on this,
> > because I was thinking along somewhat similar lines for the
> > openstack-resource-agents repository which I maintain.
> > 
> > Currently the OpenStack RAs duplicate much of the logic and config of
> > corresponding systemd / LSB init scripts for starting / stopping
> > OpenStack services and checking their status.  The main difference is
> > that RAs also have a "monitor" action which can check the health of
> > the service at application level, e.g. via HTTP rather than a naive
> > "is this pid running" kind of check.
> > 
> > This duplication causes issues with portability between Linux
> > distributions, since each distribution has a slightly different way of
> > starting and stopping the services.  It also results in subtlely
> > different behaviour for OpenStack clouds depending on whether or not
> > they are deployed in HA mode using Pacemaker.
> > 
> > As a result I have been thinking about the idea of changing the
> > start/stop/status actions of these RAs so that they wrap around
> > service(8) (which would be even more portable across distros than
> > systemctl).
> > 
> > The primary difference with your approach is that we probably wouldn't
> > need to make the RAs dynamically create any systemd configuration, since
> > that would already be provided by the packages which install the OpenStack
> > services.  But then AFAIK none of the OpenStack services use the
> > multi-instance feature of systemd (foo@{one,two,three,etc}.service).
> 
> The main complication I see is that pacemaker expects OCF agents to
> return success only after an action is complete. For example, start
> should not return until the service is fully active. I believe systemctl
> does not behave this way, rather it initiates the action and returns
> immediately.

But that's trivial to work around: polling via "service foo status"
after "service foo start" converts it back from an asynchronous
operation to a synchronous one.

> Pacemaker's native systemd integration has a lot of workarounds for
> quirks in systemd behavior (and more every release). I'm not sure
> moving/duplicating that logic to the RA is a good approach.

What other quirks are there?