[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Mon Oct 24 18:26:03 EDT 2016

On 10/21/2016 07:40 PM, Adam Spiers wrote:
> Ken Gaillot <kgaillot at redhat.com> wrote:
>> On 09/26/2016 09:15 AM, Adam Spiers wrote:
>>> [Sending this as a separate mail, since the last one was already (too)
>>> long and focused on specific details, whereas this one takes a step
>>> back to think about the bigger picture again.]
>>>
>>> Adam Spiers <aspiers at suse.com> wrote:
>>>>>>>>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>>>>>>>>>> As a result I have been thinking about the idea of changing the
>>>>>>>>>> start/stop/status actions of these RAs so that they wrap around
>>>>>>>>>> service(8) (which would be even more portable across distros than
>>>>>>>>>> systemctl).
>>>
>>> [snipped discussion of OCF wrapper RA idea]
>>>
>>>> The fact that I don't see any problems where you apparently do makes
>>>> me deeply suspicious of my own understanding ;-)  Please tell me what
>>>> I'm missing.
>>>
>>> [snipped]
>>>
>>> To clarify: I am not religiously defending this "wrapper OCF RA" idea
>>> of mine to the death.  It certainly sounds like it's not as clean as I
>>> originally thought.  But I'm still struggling to see any dealbreaker.
>>>
>>> OTOH, I'm totally open to better ideas.
>>>
>>> For example, could Pacemaker be extended to allow hybrid resources,
>>> where some actions (such as start, stop, status) are handled by (say)
>>> the systemd backend, and other actions (such as monitor) are handled
>>> by (say) the OCF backend?  Then we could cleanly rely on dbus for
>>> collaborating with systemd, whilst adding arbitrarily complex
>>> monitoring via OCF RAs.  That would have several advantages:
>>>
>>> 1. Get rid of grotesque layering violations and maintenance boundaries
>>>    where the OCF RA duplicates knowledge of all kinds of things which
>>>    are distribution-specific, e.g.:
>>>
>>>      https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
>>
>> A simplified agent will likely still need distro-specific intelligence
>> to do even a limited subset of actions, so I'm not sure there's a gain
>> there.
> 
> What distro-specific intelligence would it need?  If the OCF RA was
> only responsible for monitoring, it wouldn't need to know a lot of the
> things which are only required for starting / stopping the service and
> checking whether it's running, e.g.:
> 
>   - Name of the daemon executable
>   - uid/gid it should be started as
>   - Daemon CLI arguments
>   - Location of pid file
> 
> In contrast, an OCF RA only responsible for monitoring would only need
> to know how to talk to the service, which is not typically
> distro-specific; in the REST API case, it only needs to know the endpoint
> URL, which would be configured via Pacemaker resource parameters anyway.

If you're only talking about monitors, that does simplify things. As you
mention, you'd still need to configure resource parameters that would
only be relevant to the enhanced monitor action -- parameters that other
actions might also need, and get elsewhere, so there's the minor admin
complication of setting the same value in multiple places.

>>> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>>>    to systemd, thereby increasing readability and reducing maintenance
>>>    burden.
>>>
>>> 3. OCF RAs are more likely to work out of the box with any distro,
>>>    or at least require less work to get working.
>>>
>>> 4. Services behave more similarly regardless of whether managed by
>>>    Pacemaker or the standard pid 1 service manager.  For example, they
>>>    will always use the same pidfile, run as the same user, in the
>>>    right cgroup, be invoked with the same arguments etc.
>>>
>>> 5. Pacemaker can still monitor services accurately at the
>>>    application-level, rather than just relying on naive pid-level
>>>    monitoring.
>>>
>>> Or is this a terrible idea? ;-)
>>
>> I considered this, too. I don't think it's a terrible idea, but it does
>> pose its own questions.
>>
>> * What hybrid actions should be allowed? It seems dangerous to allow
>> starting from one code base and stopping from another, or vice versa,
>> and really dangerous to allow something like migrate_to/migrate_from to
>> be reimplemented. At one extreme, we allow anything and leave that
>> responsibility on the user; at the other, we only allow higher-level
>> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
> 
> Just monitors would be good enough for me.

The tomcat RA (which could also benefit from something like this) would
extend start and stop as well, e.g. start = systemctl start plus some
bookkeeping.

>> * Should the wrapper's actions be done instead of, or in addition to,
>> the main resource's actions? Or maybe even allow the user to choose? I
>> could see some wrappers intended to replace the native handling, and
>> others to supplement it.
> 
> For my use case, in addition, because the only motivation is to
> delegate start/stop/status to systemd (as happens currently with
> systemd:* RAs) whilst retaining the ability to do service-level
> testing of the resource via the OCF RA.  So it wouldn't really be a
> wrapper, but rather an extension.
> 
> In contrast, with the wrapper approach, it sounds like the delegation
> would have to happen via systemctl not via Pacemaker's dbus code.  And
> if systemctl start/stop really are asynchronous non-blocking, the
> delegation would need to be able to wrap these start/stop calls in a
> polling loop as previously mentioned, in order to make them
> synchronous non-blocking (which is the behaviour I think most people
> would expect).

Someone else suggested that systemctl is already blocking, which would
simplify the wrapper approach.

>> * The answers to the above will help decide whether the wrapper is a
>> separate resource (with its own parameters, operations, timeouts, etc.),
>> or just a property of the main resource.
>>
>> * If we allow anything other than monitors to be hybridized, I think we
>> get into a pacemaker-specific implementation. I don't think it's
>> feasible to include this in the OCF standard -- it would essentially
>> mandate pacemaker's "resource class" mechanism on all OCF users (which
>> is beyond OCF's scope), and would likely break manual/scripted use
>> altogether. We could possibly modify OCF so that agents so that no
>> actions are mandatory, and it's up to the OCF-using software to verify
>> that any actions it requires are supported. Or maybe wrappers just
>> implement some actions as no-ops, and it's up to the user to know the
>> limitations.
> 
> Sure.  Hopefully you will be in Barcelona so we can discuss more?

Sadly, no :)

If systemctl blocks, the wrapper approach would be easiest -- the
wrapper can conform to the OCF standard, and requires no special
handling in pacemaker.

The hybrid approach would require some sort of "almost OCF" standard for
the extender, new syntax in pacemaker to configure it, and a good bit of
intelligence in pacemaker (e.g. recovery if one part of the hybrid fails
or times out).