[ClusterLabs Developers] RA as a systemd wrapper -- the right way?
Ken Gaillot
kgaillot at redhat.com
Thu Sep 22 00:05:32 CEST 2016
On 09/21/2016 03:25 PM, Adam Spiers wrote:
> Hi Jan,
>
> Jan Pokorný <jpokorny at redhat.com> wrote:
>> Hello,
>>
>> https://github.com/ClusterLabs/resource-agents/pull/846 seems to be
>> a first crack on integrating systemd to otherwise init-system-unaware
>> resource-agents.
>>
>> As pacemaker already handles native systemd integration, I wonder if
>> it wouldn't be better to just allow, on top of that, perhaps as
>> special "systemd+hooks" class of resources that would also accept
>> "hooks" (meta) attribute pointing to an executable implementing
>> formalized API akin to OCF (say on-start, on-stop, meta-data
>> actions) that would take care of initial reflecting on the rest of
>> the parameters + possibly a cleanup later on.
I can see the usefulness of having "hooks" for OS resources
(systemd/lsb/upstart/service). Let pacemaker start and stop the resource
via the OS mechanism, but do a little bit of extra housekeeping.
It could easily get ugly, though. Version dependencies, extra overhead, etc.
>> Technically, something akin to injecting Environment, ExecStartPre
>> and ExecStopPost to the service definition might also achieve the
>> same goal if there's a transparent way to do it from pacemaker using
>> just systemd API (I don't know).
Sure, pacemaker already creates a unit override before starting a
systemd resource. It would be trivial to add this. It could even simply
be configured as meta-attributes of systemd resources.
However, that wouldn't let you change the behavior of a status call, for
example.
>> Indeed, the scenario I have in mind would make do with separate
>> "prepare grounds" agent, suitably grouped with such systemd-class
>> resource, but that seems more fragile configuration-wise (this
>> is not the granularity cluster administrator would be supposed
>> to be thinking in, IMHO, just as with ocf class).
That isn't pretty either, but it's probably the best approach currently.
There are some non-obvious pitfalls when writing a "secondary" OCF agent
like this, but it's easy to document what they are and how to avoid them.
Nagios agents are another possibility; essentially, they implement a
status action and nothing else. So, a systemd resource + nagios resource
would provide an application-aware status.
Constraints and failure handling become trickier with this "two agents"
approach.
>> Just thinking aloud before the can is open.
>
> Thanks for sharing - I'm very interested to hear your ideas on this,
> because I was thinking along somewhat similar lines for the
> openstack-resource-agents repository which I maintain.
>
> Currently the OpenStack RAs duplicate much of the logic and config of
> corresponding systemd / LSB init scripts for starting / stopping
> OpenStack services and checking their status. The main difference is
> that RAs also have a "monitor" action which can check the health of
> the service at application level, e.g. via HTTP rather than a naive
> "is this pid running" kind of check.
>
> This duplication causes issues with portability between Linux
> distributions, since each distribution has a slightly different way of
> starting and stopping the services. It also results in subtlely
> different behaviour for OpenStack clouds depending on whether or not
> they are deployed in HA mode using Pacemaker.
>
> As a result I have been thinking about the idea of changing the
> start/stop/status actions of these RAs so that they wrap around
> service(8) (which would be even more portable across distros than
> systemctl).
>
> The primary difference with your approach is that we probably wouldn't
> need to make the RAs dynamically create any systemd configuration, since
> that would already be provided by the packages which install the OpenStack
> services. But then AFAIK none of the OpenStack services use the
> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
>
> Cheers,
> Adam
The main complication I see is that pacemaker expects OCF agents to
return success only after an action is complete. For example, start
should not return until the service is fully active. I believe systemctl
does not behave this way, rather it initiates the action and returns
immediately.
Pacemaker's native systemd integration has a lot of workarounds for
quirks in systemd behavior (and more every release). I'm not sure
moving/duplicating that logic to the RA is a good approach.
More information about the Developers
mailing list