[ClusterLabs Developers] RA as a systemd wrapper -- the right way?

Fri Sep 23 00:46:08 UTC 2016

> On 23 Sep 2016, at 12:49 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
> 
> On 09/22/2016 08:49 AM, Adam Spiers wrote:
>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>> On 09/21/2016 03:25 PM, Adam Spiers wrote:
>>>> Jan Pokorný <jpokorny at redhat.com> wrote:
>>>>> Just thinking aloud before the can is open.
>>>> 
>>>> Thanks for sharing - I'm very interested to hear your ideas on this,
>>>> because I was thinking along somewhat similar lines for the
>>>> openstack-resource-agents repository which I maintain.
>>>> 
>>>> Currently the OpenStack RAs duplicate much of the logic and config of
>>>> corresponding systemd / LSB init scripts for starting / stopping
>>>> OpenStack services and checking their status.  The main difference is
>>>> that RAs also have a "monitor" action which can check the health of
>>>> the service at application level, e.g. via HTTP rather than a naive
>>>> "is this pid running" kind of check.
>>>> 
>>>> This duplication causes issues with portability between Linux
>>>> distributions, since each distribution has a slightly different way of
>>>> starting and stopping the services.  It also results in subtlely
>>>> different behaviour for OpenStack clouds depending on whether or not
>>>> they are deployed in HA mode using Pacemaker.
>>>> 
>>>> As a result I have been thinking about the idea of changing the
>>>> start/stop/status actions of these RAs so that they wrap around
>>>> service(8) (which would be even more portable across distros than
>>>> systemctl).
>>>> 
>>>> The primary difference with your approach is that we probably wouldn't
>>>> need to make the RAs dynamically create any systemd configuration, since
>>>> that would already be provided by the packages which install the OpenStack
>>>> services.  But then AFAIK none of the OpenStack services use the
>>>> multi-instance feature of systemd (foo@{one,two,three,etc}.service).
>>> 
>>> The main complication I see is that pacemaker expects OCF agents to
>>> return success only after an action is complete. For example, start
>>> should not return until the service is fully active. I believe systemctl
>>> does not behave this way, rather it initiates the action and returns
>>> immediately.
>> 
>> But that's trivial to work around: polling via "service foo status"
>> after "service foo start" converts it back from an asynchronous
>> operation to a synchronous one.
> 
> Yes, that's exactly what pacemaker does now: start/stop, then every two
> seconds, poll the status.
> 
> However, I'm currently working on a project to change that, so that we
> use DBus signalling to be notified when the job completes, rather than
> (or in addition to) polling.
> 
> The reason is twofold: the two-second wait can be an unnecessary
> recovery delay in some cases; and (at least from the DBus API, not sure
> about systemctl status) there's no reliable way to distinguish "service
> is inactive because the start didn't work properly" from "service is
> inactive because systemd has some slow-starting dependencies of its own
> to start first”.

The systemd folks are telling us that the only real way reliably synchronously start a service is by watching DBus, which suggests that a shell based approach is doomed to fail.

> 
>>> Pacemaker's native systemd integration has a lot of workarounds for
>>> quirks in systemd behavior (and more every release). I'm not sure
>>> moving/duplicating that logic to the RA is a good approach.
>> 
>> What other quirks are there?
> 
> When pacemaker starts a systemd service, it creates a unit override in
> /run/systemd/system/<agent>.service.d/50-pacemaker.conf, with these
> overrides (and removes the file when stopping the resource):
> 
> * It prefixes the description with "Cluster Controlled" (e.g. "Postfix
> Mail Transport Agent" -> "Cluster Controlled Postfix Mail Transport
> Agent"). This gives a clear indicator in systemd messages in the syslog
> that it's a cluster resource.
> 
> * "Before=pacemaker.service": This ensures that when someone shuts down
> the system via systemd, systemd doesn't stop pacemaker before pacemaker
> can stop the resource.
> 
> * "Restart=no": This ensures that pacemaker stays in control of
> responding to service failures.
> 
> 
> Additionally:
> 
> * Pacemaker uses intelligent timeout values (based on cluster
> configuration) when making systemd calls.
> 
> * Pacemaker interprets/remaps systemd return status as needed. For
> example, a stop followed by a status poll that returns "OK" means the
> service is still running. Fairly obvious, but there are a lot of cases
> that need to be handled.
> 
> All of these were added gradually over the past few years, so I'd expect
> the list to grow over the next few years.
> 
> 
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org <mailto:Developers at clusterlabs.org>
> http://clusterlabs.org/mailman/listinfo/developers <http://clusterlabs.org/mailman/listinfo/developers>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/developers/attachments/20160923/fa11f8b3/attachment-0002.html>