[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Fri Oct 21 20:40:08 EDT 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> On 09/26/2016 09:15 AM, Adam Spiers wrote:
> > [Sending this as a separate mail, since the last one was already (too)
> > long and focused on specific details, whereas this one takes a step
> > back to think about the bigger picture again.]
> > 
> > Adam Spiers <aspiers at suse.com> wrote:
> >>>>>>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
> >>>>>>>> As a result I have been thinking about the idea of changing the
> >>>>>>>> start/stop/status actions of these RAs so that they wrap around
> >>>>>>>> service(8) (which would be even more portable across distros than
> >>>>>>>> systemctl).
> > 
> > [snipped discussion of OCF wrapper RA idea]
> > 
> >> The fact that I don't see any problems where you apparently do makes
> >> me deeply suspicious of my own understanding ;-)  Please tell me what
> >> I'm missing.
> > 
> > [snipped]
> > 
> > To clarify: I am not religiously defending this "wrapper OCF RA" idea
> > of mine to the death.  It certainly sounds like it's not as clean as I
> > originally thought.  But I'm still struggling to see any dealbreaker.
> > 
> > OTOH, I'm totally open to better ideas.
> > 
> > For example, could Pacemaker be extended to allow hybrid resources,
> > where some actions (such as start, stop, status) are handled by (say)
> > the systemd backend, and other actions (such as monitor) are handled
> > by (say) the OCF backend?  Then we could cleanly rely on dbus for
> > collaborating with systemd, whilst adding arbitrarily complex
> > monitoring via OCF RAs.  That would have several advantages:
> > 
> > 1. Get rid of grotesque layering violations and maintenance boundaries
> >    where the OCF RA duplicates knowledge of all kinds of things which
> >    are distribution-specific, e.g.:
> > 
> >      https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
> 
> A simplified agent will likely still need distro-specific intelligence
> to do even a limited subset of actions, so I'm not sure there's a gain
> there.

What distro-specific intelligence would it need?  If the OCF RA was
only responsible for monitoring, it wouldn't need to know a lot of the
things which are only required for starting / stopping the service and
checking whether it's running, e.g.:

  - Name of the daemon executable
  - uid/gid it should be started as
  - Daemon CLI arguments
  - Location of pid file

In contrast, an OCF RA only responsible for monitoring would only need
to know how to talk to the service, which is not typically
distro-specific; in the REST API case, it only needs to know the endpoint
URL, which would be configured via Pacemaker resource parameters anyway.

> > 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
> >    to systemd, thereby increasing readability and reducing maintenance
> >    burden.
> > 
> > 3. OCF RAs are more likely to work out of the box with any distro,
> >    or at least require less work to get working.
> > 
> > 4. Services behave more similarly regardless of whether managed by
> >    Pacemaker or the standard pid 1 service manager.  For example, they
> >    will always use the same pidfile, run as the same user, in the
> >    right cgroup, be invoked with the same arguments etc.
> > 
> > 5. Pacemaker can still monitor services accurately at the
> >    application-level, rather than just relying on naive pid-level
> >    monitoring.
> > 
> > Or is this a terrible idea? ;-)
> 
> I considered this, too. I don't think it's a terrible idea, but it does
> pose its own questions.
> 
> * What hybrid actions should be allowed? It seems dangerous to allow
> starting from one code base and stopping from another, or vice versa,
> and really dangerous to allow something like migrate_to/migrate_from to
> be reimplemented. At one extreme, we allow anything and leave that
> responsibility on the user; at the other, we only allow higher-level
> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.

Just monitors would be good enough for me.

> * Should the wrapper's actions be done instead of, or in addition to,
> the main resource's actions? Or maybe even allow the user to choose? I
> could see some wrappers intended to replace the native handling, and
> others to supplement it.

For my use case, in addition, because the only motivation is to
delegate start/stop/status to systemd (as happens currently with
systemd:* RAs) whilst retaining the ability to do service-level
testing of the resource via the OCF RA.  So it wouldn't really be a
wrapper, but rather an extension.

In contrast, with the wrapper approach, it sounds like the delegation
would have to happen via systemctl not via Pacemaker's dbus code.  And
if systemctl start/stop really are asynchronous non-blocking, the
delegation would need to be able to wrap these start/stop calls in a
polling loop as previously mentioned, in order to make them
synchronous non-blocking (which is the behaviour I think most people
would expect).

> * The answers to the above will help decide whether the wrapper is a
> separate resource (with its own parameters, operations, timeouts, etc.),
> or just a property of the main resource.
> 
> * If we allow anything other than monitors to be hybridized, I think we
> get into a pacemaker-specific implementation. I don't think it's
> feasible to include this in the OCF standard -- it would essentially
> mandate pacemaker's "resource class" mechanism on all OCF users (which
> is beyond OCF's scope), and would likely break manual/scripted use
> altogether. We could possibly modify OCF so that agents so that no
> actions are mandatory, and it's up to the OCF-using software to verify
> that any actions it requires are supported. Or maybe wrappers just
> implement some actions as no-ops, and it's up to the user to know the
> limitations.

Sure.  Hopefully you will be in Barcelona so we can discuss more?