[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Wed Feb 8 13:57:16 EST 2017

Ken Gaillot <kgaillot at redhat.com> wrote:
>On 11/03/2016 02:37 PM, Adam Spiers wrote:
>> Hi again Ken,
>>
>> Sorry for the delayed reply, caused by Barcelona amongst other things ...

Hmm, seems I have to apologise for yet another delayed reply :-(  I'm
deliberately not trimming the context, so everyone can refresh their
memory of this old thread!

>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>> On 10/21/2016 07:40 PM, Adam Spiers wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>> On 09/26/2016 09:15 AM, Adam Spiers wrote:
>>>>>> For example, could Pacemaker be extended to allow hybrid resources,
>>>>>> where some actions (such as start, stop, status) are handled by (say)
>>>>>> the systemd backend, and other actions (such as monitor) are handled
>>>>>> by (say) the OCF backend?  Then we could cleanly rely on dbus for
>>>>>> collaborating with systemd, whilst adding arbitrarily complex
>>>>>> monitoring via OCF RAs.  That would have several advantages:
>>>>>>
>>>>>> 1. Get rid of grotesque layering violations and maintenance boundaries
>>>>>>    where the OCF RA duplicates knowledge of all kinds of things which
>>>>>>    are distribution-specific, e.g.:
>>>>>>
>>>>>>      https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/apache#L56
>>>>>
>>>>> A simplified agent will likely still need distro-specific intelligence
>>>>> to do even a limited subset of actions, so I'm not sure there's a gain
>>>>> there.
>>>>
>>>> What distro-specific intelligence would it need?  If the OCF RA was
>>>> only responsible for monitoring, it wouldn't need to know a lot of the
>>>> things which are only required for starting / stopping the service and
>>>> checking whether it's running, e.g.:
>>>>
>>>>   - Name of the daemon executable
>>>>   - uid/gid it should be started as
>>>>   - Daemon CLI arguments
>>>>   - Location of pid file
>>>>
>>>> In contrast, an OCF RA only responsible for monitoring would only need
>>>> to know how to talk to the service, which is not typically
>>>> distro-specific; in the REST API case, it only needs to know the endpoint
>>>> URL, which would be configured via Pacemaker resource parameters anyway.
>>>
>>> If you're only talking about monitors, that does simplify things. As you
>>> mention, you'd still need to configure resource parameters that would
>>> only be relevant to the enhanced monitor action -- parameters that other
>>> actions might also need, and get elsewhere, so there's the minor admin
>>> complication of setting the same value in multiple places.
>>
>> Which same value(s)?
>
>Nothing particular in mind, just app-specific information that's
>required in both the app's own configuration and in the resource
>configuration. Even stuff like an IP address/port.

That could potentially be reduced by simply pointing the RA at the
same config file(s) which the systemd service definition uses, and
then it could use something like crudini (in the OpenStack case) to
extract values like the port the service is listening on.  (Assuming
the service is listening on localhost, you could simply use that
instead of the external IP address.)  So yes, you'd have to set the
location of the config file(s) in two places, but at least then you
wouldn't have to duplicate anything else.

>> In the OpenStack case (which is the only use case I have), I don't
>> think this will happen, because the "monitor" action only needs to
>> know the endpoint URL and associated credentials, which doesn't
>> overlap with what the other actions need to know.  This separation of
>> concerns feels right to me: the start/stop/status actions are
>> responsible for managing the state of the service, and the monitor
>> action is responsible for monitoring whether it's delivering what it
>> should be.  It's just like the separation between admins and end
>> users.
>>
>>>>>> 2. Drastically simplify OCF RAs by delegating start/stop/status etc.
>>>>>>    to systemd, thereby increasing readability and reducing maintenance
>>>>>>    burden.
>>>>>>
>>>>>> 3. OCF RAs are more likely to work out of the box with any distro,
>>>>>>    or at least require less work to get working.
>>>>>>
>>>>>> 4. Services behave more similarly regardless of whether managed by
>>>>>>    Pacemaker or the standard pid 1 service manager.  For example, they
>>>>>>    will always use the same pidfile, run as the same user, in the
>>>>>>    right cgroup, be invoked with the same arguments etc.
>>>>>>
>>>>>> 5. Pacemaker can still monitor services accurately at the
>>>>>>    application-level, rather than just relying on naive pid-level
>>>>>>    monitoring.
>>>>>>
>>>>>> Or is this a terrible idea? ;-)
>>>>>
>>>>> I considered this, too. I don't think it's a terrible idea, but it does
>>>>> pose its own questions.
>>>>>
>>>>> * What hybrid actions should be allowed? It seems dangerous to allow
>>>>> starting from one code base and stopping from another, or vice versa,
>>>>> and really dangerous to allow something like migrate_to/migrate_from to
>>>>> be reimplemented. At one extreme, we allow anything and leave that
>>>>> responsibility on the user; at the other, we only allow higher-level
>>>>> monitors (i.e. using OCF_CHECK_LEVEL) to be hybridized.
>>>>
>>>> Just monitors would be good enough for me.
>>>
>>> The tomcat RA (which could also benefit from something like this) would
>>> extend start and stop as well, e.g. start = systemctl start plus some
>>> bookkeeping.
>>
>> Ahh OK, interesting.  What kind of bookkeeping?
>
>I don't remember ... something like node attributes or a pid/status
>file. That could potentially be handled by separate OCF resources for
>those bits, grouped with the systemd+monitor resource, but that would be
>hacky and maybe insufficient in some use cases.

I have no idea if the proposal makes sense in the case of the tomcat
RA.  But if it doesn't, I hope that alone doesn't rule out the
proposal being implemented to satisfy other use cases :-)

>>>>> * Should the wrapper's actions be done instead of, or in addition to,
>>>>> the main resource's actions? Or maybe even allow the user to choose? I
>>>>> could see some wrappers intended to replace the native handling, and
>>>>> others to supplement it.
>>>>
>>>> For my use case, in addition, because the only motivation is to
>>>> delegate start/stop/status to systemd (as happens currently with
>>>> systemd:* RAs) whilst retaining the ability to do service-level
>>>> testing of the resource via the OCF RA.  So it wouldn't really be a
>>>> wrapper, but rather an extension.
>>>>
>>>> In contrast, with the wrapper approach, it sounds like the delegation
>>>> would have to happen via systemctl not via Pacemaker's dbus code.  And
>>>> if systemctl start/stop really are asynchronous non-blocking, the
>>>> delegation would need to be able to wrap these start/stop calls in a
>>>> polling loop as previously mentioned, in order to make them
>>>> synchronous non-blocking (which is the behaviour I think most people
>>>> would expect).
>>>
>>> Someone else suggested that systemctl is already blocking, which would
>>> simplify the wrapper approach.
>>
>> I'm not sure, but IIRC Andrew said that it is not reliably blocking.
>> Maybe this is in part due to service startup after daemonization,
>> which unavoidably happens in an asynchronous manner.
>
>I suppose that would be service-specific.

Most likely, yes.

>At least systemd's own
>handling can be synchronous (I see systemctl requires --no-block to be
>asynchronous, so it does seem like this is the case). I think this makes
>the wrapper approach much more likely to work. Some polling may be
>necessary for specific services that take some time before a monitor
>would be OK, but I don't think that's a major issue.

Yeah, that sounds OK to me.

>>>>> * The answers to the above will help decide whether the wrapper is a
>>>>> separate resource (with its own parameters, operations, timeouts, etc.),
>>>>> or just a property of the main resource.
>>>>>
>>>>> * If we allow anything other than monitors to be hybridized, I think we
>>>>> get into a pacemaker-specific implementation. I don't think it's
>>>>> feasible to include this in the OCF standard -- it would essentially
>>>>> mandate pacemaker's "resource class" mechanism on all OCF users (which
>>>>> is beyond OCF's scope), and would likely break manual/scripted use
>>>>> altogether. We could possibly modify OCF so that agents so that no
>>>>> actions are mandatory, and it's up to the OCF-using software to verify
>>>>> that any actions it requires are supported. Or maybe wrappers just
>>>>> implement some actions as no-ops, and it's up to the user to know the
>>>>> limitations.
>>>>
>>>> Sure.  Hopefully you will be in Barcelona so we can discuss more?
>>>
>>> Sadly, no :)
>>>
>>> If systemctl blocks, the wrapper approach would be easiest -- the
>>> wrapper can conform to the OCF standard, and requires no special
>>> handling in pacemaker.
>>
>> Well, it would still need a common RA library which provides a
>> function for writing the systemd service override, right?
>
>That's trivial enough to put in the RA for now. Once we demonstrate this
>approach works, we can maybe include it with pacemaker somehow.

Makes sense.

>>> The hybrid approach would require some sort of "almost OCF" standard for
>>> the extender, new syntax in pacemaker to configure it, and a good bit of
>>> intelligence in pacemaker (e.g. recovery if one part of the hybrid fails
>>> or times out).
>>
>> Yeah.  I don't think I have a preference on either approach; I'd be
>> happy to go with whichever you think is best.  I guess there is an
>> argument in favour of trying out a spike on the wrapper approach
>> first, since presumably it should be much easier to implement than the
>> hybrid approach.  Thoughts?
>
>Agreed

OK, so I guess this is blocking on me to do the spike.  I'll be at the
Atlanta PTG in a couple of weeks, so if anyone else reading this
thread happens to be involved in OpenStack and attending the PTG, drop
me a line!