[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Thu Nov 10 13:13:09 EST 2016

On 11/03/2016 02:37 PM, Adam Spiers wrote:
> Hi again Ken,
> 
> Sorry for the delayed reply, caused by Barcelona amongst other things ...
> 
> Ken Gaillot <kgaillot at redhat.com> wrote:
>> On 10/21/2016 07:40 PM, Adam Spiers wrote:
>>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>>> On 09/26/2016 09:15 AM, Adam Spiers wrote:
>>>>> For example, could Pacemaker be extended to allow hybrid resources,
>>>>> where some actions (such as start, stop, status) are handled by (say)
>>>>> the systemd backend, and other actions (such as monitor) are handled
>>>>> by (say) the OCF backend?  Then we could cleanly rely on dbus for
>>>>> collaborating with systemd, whilst adding arbitrarily complex
>>>>> monitoring via OCF RAs.  That would have several advantages:
>>>>>
>>>>> 1. Get rid of grotesque layering violations and maintenance boundaries
>>>>>    where the OCF RA duplicates knowledge of all kinds of things which
>>>>>    are distribution-specific, e.g.:
>>>>>
>>>>>      https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
>>>>
>>>> A simplified agent will likely still need distro-specific intelligence
>>>> to do even a limited subset of actions, so I'm not sure there's a gain
>>>> there.
>>>
>>> What distro-specific intelligence would it need?  If the OCF RA was
>>> only responsible for monitoring, it wouldn't need to know a lot of the
>>> things which are only required for starting / stopping the service and
>>> checking whether it's running, e.g.:
>>>
>>>   - Name of the daemon executable
>>>   - uid/gid it should be started as
>>>   - Daemon CLI arguments
>>>   - Location of pid file
>>>
>>> In contrast, an OCF RA only responsible for monitoring would only need
>>> to know how to talk to the service, which is not typically
>>> distro-specific; in the REST API case, it only needs to know the endpoint
>>> URL, which would be configured via Pacemaker resource parameters anyway.
>>
>> If you're only talking about monitors, that does simplify things. As you
>> mention, you'd still need to configure resource parameters that would
>> only be relevant to the enhanced monitor action -- parameters that other
>> actions might also need, and get elsewhere, so there's the minor admin
>> complication of setting the same value in multiple places.
> 
> Which same value(s)?

Nothing particular in mind, just app-specific information that's
required in both the app's own configuration and in the resource
configuration. Even stuff like an IP address/port.

> In the OpenStack case (which is the only use case I have), I don't
> think this will happen, because the "monitor" action only needs to
> know the endpoint URL and associated credentials, which doesn't
> overlap with what the other actions need to know.  This separation of
> concerns feels right to me: the start/stop/status actions are
> responsible for managing the state of the service, and the monitor
> action is responsible for monitoring whether it's delivering what it
> should be.  It's just like the separation between admins and end
> users.
> 
>>>>> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>>>>>    to systemd, thereby increasing readability and reducing maintenance
>>>>>    burden.
>>>>>
>>>>> 3. OCF RAs are more likely to work out of the box with any distro,
>>>>>    or at least require less work to get working.
>>>>>
>>>>> 4. Services behave more similarly regardless of whether managed by
>>>>>    Pacemaker or the standard pid 1 service manager.  For example, they
>>>>>    will always use the same pidfile, run as the same user, in the
>>>>>    right cgroup, be invoked with the same arguments etc.
>>>>>
>>>>> 5. Pacemaker can still monitor services accurately at the
>>>>>    application-level, rather than just relying on naive pid-level
>>>>>    monitoring.
>>>>>
>>>>> Or is this a terrible idea? ;-)
>>>>
>>>> I considered this, too. I don't think it's a terrible idea, but it does
>>>> pose its own questions.
>>>>
>>>> * What hybrid actions should be allowed? It seems dangerous to allow
>>>> starting from one code base and stopping from another, or vice versa,
>>>> and really dangerous to allow something like migrate_to/migrate_from to
>>>> be reimplemented. At one extreme, we allow anything and leave that
>>>> responsibility on the user; at the other, we only allow higher-level
>>>> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
>>>
>>> Just monitors would be good enough for me.
>>
>> The tomcat RA (which could also benefit from something like this) would
>> extend start and stop as well, e.g. start = systemctl start plus some
>> bookkeeping.
> 
> Ahh OK, interesting.  What kind of bookkeeping?

I don't remember ... something like node attributes or a pid/status
file. That could potentially be handled by separate OCF resources for
those bits, grouped with the systemd+monitor resource, but that would be
hacky and maybe insufficient in some use cases.

>>>> * Should the wrapper's actions be done instead of, or in addition to,
>>>> the main resource's actions? Or maybe even allow the user to choose? I
>>>> could see some wrappers intended to replace the native handling, and
>>>> others to supplement it.
>>>
>>> For my use case, in addition, because the only motivation is to
>>> delegate start/stop/status to systemd (as happens currently with
>>> systemd:* RAs) whilst retaining the ability to do service-level
>>> testing of the resource via the OCF RA.  So it wouldn't really be a
>>> wrapper, but rather an extension.
>>>
>>> In contrast, with the wrapper approach, it sounds like the delegation
>>> would have to happen via systemctl not via Pacemaker's dbus code.  And
>>> if systemctl start/stop really are asynchronous non-blocking, the
>>> delegation would need to be able to wrap these start/stop calls in a
>>> polling loop as previously mentioned, in order to make them
>>> synchronous non-blocking (which is the behaviour I think most people
>>> would expect).
>>
>> Someone else suggested that systemctl is already blocking, which would
>> simplify the wrapper approach.
> 
> I'm not sure, but IIRC Andrew said that it is not reliably blocking.
> Maybe this is in part due to service startup after daemonization,
> which unavoidably happens in an asynchronous manner.

I suppose that would be service-specific. At least systemd's own
handling can be synchronous (I see systemctl requires --no-block to be
asynchronous, so it does seem like this is the case). I think this makes
the wrapper approach much more likely to work. Some polling may be
necessary for specific services that take some time before a monitor
would be OK, but I don't think that's a major issue.

>>>> * The answers to the above will help decide whether the wrapper is a
>>>> separate resource (with its own parameters, operations, timeouts, etc.),
>>>> or just a property of the main resource.
>>>>
>>>> * If we allow anything other than monitors to be hybridized, I think we
>>>> get into a pacemaker-specific implementation. I don't think it's
>>>> feasible to include this in the OCF standard -- it would essentially
>>>> mandate pacemaker's "resource class" mechanism on all OCF users (which
>>>> is beyond OCF's scope), and would likely break manual/scripted use
>>>> altogether. We could possibly modify OCF so that agents so that no
>>>> actions are mandatory, and it's up to the OCF-using software to verify
>>>> that any actions it requires are supported. Or maybe wrappers just
>>>> implement some actions as no-ops, and it's up to the user to know the
>>>> limitations.
>>>
>>> Sure.  Hopefully you will be in Barcelona so we can discuss more?
>>
>> Sadly, no :)
>>
>> If systemctl blocks, the wrapper approach would be easiest -- the
>> wrapper can conform to the OCF standard, and requires no special
>> handling in pacemaker.
> 
> Well, it would still need a common RA library which provides a
> function for writing the systemd service override, right?

That's trivial enough to put in the RA for now. Once we demonstrate this
approach works, we can maybe include it with pacemaker somehow.

>> The hybrid approach would require some sort of "almost OCF" standard for
>> the extender, new syntax in pacemaker to configure it, and a good bit of
>> intelligence in pacemaker (e.g. recovery if one part of the hybrid fails
>> or times out).
> 
> Yeah.  I don't think I have a preference on either approach; I'd be
> happy to go with whichever you think is best.  I guess there is an
> argument in favour of trying out a spike on the wrapper approach
> first, since presumably it should be much easier to implement than the
> hybrid approach.  Thoughts?

Agreed