[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Thu Sep 22 11:39:41 EDT 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> On 09/22/2016 08:49 AM, Adam Spiers wrote:
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >> On 09/21/2016 03:25 PM, Adam Spiers wrote:
> >>> As a result I have been thinking about the idea of changing the
> >>> start/stop/status actions of these RAs so that they wrap around
> >>> service(8) (which would be even more portable across distros than
> >>> systemctl).
> >>>
> >>> The primary difference with your approach is that we probably wouldn't
> >>> need to make the RAs dynamically create any systemd configuration, since
> >>> that would already be provided by the packages which install the OpenStack
> >>> services.  But then AFAIK none of the OpenStack services use the
> >>> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
> >>
> >> The main complication I see is that pacemaker expects OCF agents to
> >> return success only after an action is complete. For example, start
> >> should not return until the service is fully active. I believe systemctl
> >> does not behave this way, rather it initiates the action and returns
> >> immediately.
> > 
> > But that's trivial to work around: polling via "service foo status"
> > after "service foo start" converts it back from an asynchronous
> > operation to a synchronous one.
> 
> Yes, that's exactly what pacemaker does now: start/stop, then every two
> seconds, poll the status.
> 
> However, I'm currently working on a project to change that, so that we
> use DBus signalling to be notified when the job completes, rather than
> (or in addition to) polling.
> 
> The reason is twofold: the two-second wait can be an unnecessary
> recovery delay in some cases; and (at least from the DBus API, not sure
> about systemctl status) there's no reliable way to distinguish "service
> is inactive because the start didn't work properly" from "service is
> inactive because systemd has some slow-starting dependencies of its own
> to start first".

OK, that makes sense - thanks.

> >> Pacemaker's native systemd integration has a lot of workarounds for
> >> quirks in systemd behavior (and more every release). I'm not sure
> >> moving/duplicating that logic to the RA is a good approach.
> > 
> > What other quirks are there?
> 
> When pacemaker starts a systemd service, it creates a unit override in
> /run/systemd/system/<agent>.service.d/50-pacemaker.conf, with these
> overrides (and removes the file when stopping the resource):
> 
> * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
> Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
> Agent"). This gives a clear indicator in systemd messages in the syslog
> that it's a cluster resource.
> 
> * "Before=pacemaker.service": This ensures that when someone shuts down
> the system via systemd, systemd doesn't stop pacemaker before pacemaker
> can stop the resource.
> 
> * "Restart=no": This ensures that pacemaker stays in control of
> responding to service failures.

Yes, I was aware of that, and you're right that my approach of making
the RA wrap service(8) or systemctl(8) would need to duplicate this
functionality - *unless* the creation of the unit override could be
moved out of Pacemaker's C code into a shell script which both
Pacemaker and external RAs which want to adopt this wrapping technique
could call.

> Additionally:
> 
> * Pacemaker uses intelligent timeout values (based on cluster
> configuration) when making systemd calls.

I guess I'd need more details to fully understand this, but couldn't
those intelligently chosen timeout values be passed to the RA if
necessary?  Although that does put a bit of a dampener on my hope of
using service(8) to remain agnostic to whichever pid-1 system happened
to be in use on the current machine.  Having said that, maybe everyone
in the OpenStack (HA) community has already moved to systemd by now
anyway.

> * Pacemaker interprets/remaps systemd return status as needed. For
> example, a stop followed by a status poll that returns "OK" means the
> service is still running. Fairly obvious, but there are a lot of cases
> that need to be handled.

Other than (obviously) start followed by status, what other cases are
there?

All of this stuff sounds like generic problems which could be solved
once for all wrapper RAs via a simple shell library.  I'd happily
maintain this in openstack-resource-agents, although TBH it would
probably belong in resource-agents if anywhere.

> All of these were added gradually over the past few years, so I'd expect
> the list to grow over the next few years.

Well, hopefully they could be grown in a way which also supported
wrapper RAs :-)

Alternatively, if you think that there's a better solution than this
wrapper RA idea, I'm all ears.  The two main problems are essentially:

  1. RAs duplicate a whole bunch of logic / config already provided
     by vendor packages and systemd service units.

  2. RAs have a "monitor" action which can do proper application-level
     monitoring (e.g. HTTP pings), whereas apparently systemd has
     nothing equivalent.

So currently we are forced to choose between a) using systemd
Pacemaker resources, and b) having proper monitoring rather than just
naive pid-level monitoring, but having to duplicate a whole load of
stuff which systemd already does nicely.

If I'm missing something, or you can think of a better alternative
then please tell me!