[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Thu Sep 22 14:49:37 UTC 2016

On 09/22/2016 08:49 AM, Adam Spiers wrote:
> Ken Gaillot <kgaillot at redhat.com> wrote:
>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>>> Jan Pokorný <jpokorny at redhat.com> wrote:
>>>> Just thinking aloud before the can is open.
>>>
>>> Thanks for sharing - I'm very interested to hear your ideas on this,
>>> because I was thinking along somewhat similar lines for the
>>> openstack-resource-agents repository which I maintain.
>>>
>>> Currently the OpenStack RAs duplicate much of the logic and config of
>>> corresponding systemd / LSB init scripts for starting / stopping
>>> OpenStack services and checking their status.  The main difference is
>>> that RAs also have a "monitor" action which can check the health of
>>> the service at application level, e.g. via HTTP rather than a naive
>>> "is this pid running" kind of check.
>>>
>>> This duplication causes issues with portability between Linux
>>> distributions, since each distribution has a slightly different way of
>>> starting and stopping the services.  It also results in subtlely
>>> different behaviour for OpenStack clouds depending on whether or not
>>> they are deployed in HA mode using Pacemaker.
>>>
>>> As a result I have been thinking about the idea of changing the
>>> start/stop/status actions of these RAs so that they wrap around
>>> service(8) (which would be even more portable across distros than
>>> systemctl).
>>>
>>> The primary difference with your approach is that we probably wouldn't
>>> need to make the RAs dynamically create any systemd configuration, since
>>> that would already be provided by the packages which install the OpenStack
>>> services.  But then AFAIK none of the OpenStack services use the
>>> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
>>
>> The main complication I see is that pacemaker expects OCF agents to
>> return success only after an action is complete. For example, start
>> should not return until the service is fully active. I believe systemctl
>> does not behave this way, rather it initiates the action and returns
>> immediately.
> 
> But that's trivial to work around: polling via "service foo status"
> after "service foo start" converts it back from an asynchronous
> operation to a synchronous one.

Yes, that's exactly what pacemaker does now: start/stop, then every two
seconds, poll the status.

However, I'm currently working on a project to change that, so that we
use DBus signalling to be notified when the job completes, rather than
(or in addition to) polling.

The reason is twofold: the two-second wait can be an unnecessary
recovery delay in some cases; and (at least from the DBus API, not sure
about systemctl status) there's no reliable way to distinguish "service
is inactive because the start didn't work properly" from "service is
inactive because systemd has some slow-starting dependencies of its own
to start first".

>> Pacemaker's native systemd integration has a lot of workarounds for
>> quirks in systemd behavior (and more every release). I'm not sure
>> moving/duplicating that logic to the RA is a good approach.
> 
> What other quirks are there?

When pacemaker starts a systemd service, it creates a unit override in
/run/systemd/system/<agent>.service.d/50-pacemaker.conf, with these
overrides (and removes the file when stopping the resource):

* It prefixes the description with "Cluster Controlled" (e.g. "Postfix
Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
Agent"). This gives a clear indicator in systemd messages in the syslog
that it's a cluster resource.

* "Before=pacemaker.service": This ensures that when someone shuts down
the system via systemd, systemd doesn't stop pacemaker before pacemaker
can stop the resource.

* "Restart=no": This ensures that pacemaker stays in control of
responding to service failures.

Additionally:

* Pacemaker uses intelligent timeout values (based on cluster
configuration) when making systemd calls.

* Pacemaker interprets/remaps systemd return status as needed. For
example, a stop followed by a status poll that returns "OK" means the
service is still running. Fairly obvious, but there are a lot of cases
that need to be handled.

All of these were added gradually over the past few years, so I'd expect
the list to grow over the next few years.