[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Mon Sep 26 19:16:59 CEST 2016

On 09/26/2016 09:15 AM, Adam Spiers wrote:
> [Sending this as a separate mail, since the last one was already (too)
> long and focused on specific details, whereas this one takes a step
> back to think about the bigger picture again.]
> 
> Adam Spiers <aspiers at suse.com> wrote:
>>>>>>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>>>>>>>> As a result I have been thinking about the idea of changing the
>>>>>>>> start/stop/status actions of these RAs so that they wrap around
>>>>>>>> service(8) (which would be even more portable across distros than
>>>>>>>> systemctl).
> 
> [snipped discussion of OCF wrapper RA idea]
> 
>> The fact that I don't see any problems where you apparently do makes
>> me deeply suspicious of my own understanding ;-)  Please tell me what
>> I'm missing.
> 
> [snipped]
> 
> To clarify: I am not religiously defending this "wrapper OCF RA" idea
> of mine to the death.  It certainly sounds like it's not as clean as I
> originally thought.  But I'm still struggling to see any dealbreaker.
> 
> OTOH, I'm totally open to better ideas.
> 
> For example, could Pacemaker be extended to allow hybrid resources,
> where some actions (such as start, stop, status) are handled by (say)
> the systemd backend, and other actions (such as monitor) are handled
> by (say) the OCF backend?  Then we could cleanly rely on dbus for
> collaborating with systemd, whilst adding arbitrarily complex
> monitoring via OCF RAs.  That would have several advantages:
> 
> 1. Get rid of grotesque layering violations and maintenance boundaries
>    where the OCF RA duplicates knowledge of all kinds of things which
>    are distribution-specific, e.g.:
> 
>      https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56

A simplified agent will likely still need distro-specific intelligence
to do even a limited subset of actions, so I'm not sure there's a gain
there.

> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>    to systemd, thereby increasing readability and reducing maintenance
>    burden.
> 
> 3. OCF RAs are more likely to work out of the box with any distro,
>    or at least require less work to get working.
> 
> 4. Services behave more similarly regardless of whether managed by
>    Pacemaker or the standard pid 1 service manager.  For example, they
>    will always use the same pidfile, run as the same user, in the
>    right cgroup, be invoked with the same arguments etc.
> 
> 5. Pacemaker can still monitor services accurately at the
>    application-level, rather than just relying on naive pid-level
>    monitoring.
> 
> Or is this a terrible idea? ;-)

I considered this, too. I don't think it's a terrible idea, but it does
pose its own questions.

* What hybrid actions should be allowed? It seems dangerous to allow
starting from one code base and stopping from another, or vice versa,
and really dangerous to allow something like migrate_to/migrate_from to
be reimplemented. At one extreme, we allow anything and leave that
responsibility on the user; at the other, we only allow higher-level
monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.

* Should the wrapper's actions be done instead of, or in addition to,
the main resource's actions? Or maybe even allow the user to choose? I
could see some wrappers intended to replace the native handling, and
others to supplement it.

* The answers to the above will help decide whether the wrapper is a
separate resource (with its own parameters, operations, timeouts, etc.),
or just a property of the main resource.

* If we allow anything other than monitors to be hybridized, I think we
get into a pacemaker-specific implementation. I don't think it's
feasible to include this in the OCF standard -- it would essentially
mandate pacemaker's "resource class" mechanism on all OCF users (which
is beyond OCF's scope), and would likely break manual/scripted use
altogether. We could possibly modify OCF so that agents so that no
actions are mandatory, and it's up to the OCF-using software to verify
that any actions it requires are supported. Or maybe wrappers just
implement some actions as no-ops, and it's up to the user to know the
limitations.