[ClusterLabs] Antw: Doing reload right
kgaillot at redhat.com
Fri Jul 1 11:26:59 EDT 2016
On 07/01/2016 04:48 AM, Jan Pokorný wrote:
> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 30.06.2016 um 18:58 in Nachricht
>> <57754F9F.8070908 at redhat.com>:
>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>> for a while now, and I think the next release will be a good time, as it
>>> seems to be coming up more frequently.
>>> In the current implementation, Pacemaker considers a resource parameter
>>> "reloadable" if the resource agent supports the "reload" action, and the
>>> agent's metadata marks the parameter with "unique=0". If (only) such
>>> parameters get changed in the resource's pacemaker configuration,
>>> pacemaker will call the agent's reload action rather than the
>>> stop-then-start it usually does for parameter changes.
>>> This is completely broken for two reasons:
>> I agree ;-)
>>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>>> originally intended (and is widely used by existing resource agents) as
>>> a hint to UIs to indicate which parameters uniquely determine a resource
>>> instance. That is, two resource instances should never have the same
>>> value of a "unique" parameter. For this purpose, it makes perfect sense
>>> that (for example) the path to a binary command would have unique=0 --
>>> multiple resource instances could (and likely would) use the same
>>> binary. However, such a parameter could never be reloadable.
>> I tought unique=0 were reloadable (unique=1 were not)...
Correct. By "could never be reloadable", I mean that if someone changes
the location of the daemon binary, there's no way the agent could change
that with anything other than a full restart. So using unique=0 to
indicate reloadable doesn't make sense.
> I see a doubly-distorted picture here:
> - actually "unique=1" on a RA parameter (together with this RA supporting
> "reload") currently leads to reload-on-change
> - also the provided example shows why reload for "unique=0" is wrong,
> but as the opposite applies as of current state, it's not an argument
> why something is broken
> See also:
Nope, unique=1 is used for the *restart* list -- the non-reloadable
>>> 2. Every known resource agent that implements a reload action does so
>>> incorrectly. Pacemaker uses reload for changes in the resource's
>>> *pacemaker* configuration, while all known RAs use reload for a
>>> service's native reload capability of its own configuration file. As an
>>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>>> action, which will have zero effect on any pacemaker-configured
>>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>>> UI sense, and none of those parameters are actually reloadable.
> (per the last subclause, applicable also, after mentioned inversion, for
> "unique=1", such as a pid file path, which cannot be reloadable for
> apparent reason)
>> Maybe LSB confusion...
> That's not entirely fair vindication, as when you have to do some
> extra actions with parameters in LSB-aliased "start" action in the
> RA, you should do such reflections also for "reload".
I think the point is that "reload" for an LSB init script or systemd
unit always reloads the native service configuration, so it's natural
for administrators and developers to think of that when they see "reload".
>>> My proposed solution is:
>>> * Add a new "reloadable" attribute for resource agent metadata, to
>>> indicate reloadable parameters. Pacemaker would use this instead of
>> No objections if you change the XML metadata version number this time ;-)
> Good point, but I guess everyone's a bit scared to open this Pandora
> box as there's so much technical debt connected to that (unifying FA/RA
> metadata if possible, adding new UI-oriented annotations, pacemaker's
> silent additions like "private" parameter).
> I'd imagine an established authority for OCF matters (and maintaing
> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
> process inspired by Python PEPs for coordinated development:
An update to the OCF spec is long overdue. I wouldn't mind those wheels
starting to turn, but I think this reload change could proceed
independently (though of course coordinated at the appropriate time).
>>> * Add a new "reload-options" RA action for the ability to reload
>>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
>> Why not "reload-parameters"?
> That came to my mind as well. Or not wasting time/space on too many
> letters, just "reload-params", perhaps.
Just being a lazy typist :)
You're right, "parameters" or "params" would be more consistent with
existing usage. "Instance attributes" is probably the most technically
correct term. I'll vote for "reload-params"
>>> * Formalize that "reload" means reload the service's own configuration,
>>> legitimizing the most common existing RA implementations. (Pacemaker
>>> itself will not use this, but tools such as crm_resource might.)
>> Maybe be precise what your "reload-options" is expected to do,
>> compared to the "reload" action. I'm still a bit confused. Maybe a
>> working example...
> IIUIC, reload-options should first reflect the parameters as when
> "start" is invoked, then delegate the responsibility to something
> that triggers as native reload as possible (which was mentioned
> is commonly [and problematically] implemented directly in current
> "reload" actions of common RAs).
See how much confusion there is? ;-)
Let's say we have a custom resource agent that manages a daemon. The
daemon doesn't provide its own pid file, so the resource agent creates
one to track it.
The pacemaker configuration might be:
resource frobnicator pidfile=/var/run/frobnicator.pid
And separately, the daemon has its own configuration file,
/etc/frobinicator.conf, containing "debug: false".
We need two separate reload capabilities:
1. If the user reconfigures pidfile in the pacemaker configuration, the
agent needs to handle that.
2. If the user reconfigures debug in the daemon configuration, the
daemon needs to handle that.
Currently, pacemaker uses "reload" for #1, but all known published
resource agents use "reload" for #2. These two uses need to be
separated. I'm proposing we keep "reload" for #2, since most RAs are
already written that way, and add a new "reload-params" for #1.
>>> * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
>>> use unique, reloadable, reload, and reload-options properly.
>>> The downside is that this breaks backward compatibility. Any RA that
>>> actually implements unique and reload so that reload works will lose
>>> reload capability until it is updated to the new style.
>> Maybe there's a solution that is even simpler: Keep the action name,
>> but don't use "relaod" unless the RA indicated at least one
>> "reloadable" parameter. Naturally old RAs don't do it. And once an
>> author touches the RA to fix thing, he/she should do so (not just
>> adding "reloadable" to the metadata).
> IMHO, idea worth pursuing
I don't like the idea of varying pacemaker interpretation of one thing
based on the presence of another. That could make troubleshooting very
difficult for someone who's not intimately familiar with the
particulars. (There may be cases where we already do this, but that
doesn't mean I like it.)
>>> While we usually go to great lengths to preserve backward compatibility,
>>> I think it is OK to break it in this case, because most RAs that
>>> implement reload do so wrongly: some implement it as a service reload, a
>>> few advertise reload but don't actually implement it, and others map
>>> reload to start, which might theoretically work in some cases (I'm not
>>> familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
>>> typically won't, as the previous service options are not reverted (for
>>> example, I think Route would incorrectly leave the old route in the old
>>> So, I think breaking backward compatibility is actually a good thing
>>> here, since the most reload can do with existing RAs is trigger bad
>> See my comment above.
>>> The opposing view would be that we shouldn't punish any RA writer who
>>> implemented this correctly. However, there's no solution that preserves
>> The software could give a warning if the RA provides "reload", but
>> does not define any reloadble parameter. So to notify users and
>> developers that some change is needed.
I wouldn't say that's an error condition. Someone may legitimately
choose to implement reload from the beginning as a no-op, to make it
easier to add reloadable parameters in the future. Pacemaker only
reloads if both the reload action is supported and a parameter is
reloadable; there's no problem with there being one without the other.
Of course, with the proposed change, we're talking about reload-params here.
If we separate "reloadable" as its own metadata attribute (vs "unique"),
it might make sense to warn if a parameter is marked "reloadable"
without a reload-params action being supported (the inverse of your
suggestion), because that would more indicate an oversight.
>>> backward compatibility with both UI usage of unique and reload usage of
>>> unique. Plus, the worst that would happen is that the RA would stop
>>> being reloadable -- not as bad as the current possibilities from
>>> mis-implemented reload.
>> Agree on that.
>>> My questions are:
>>> Does anyone know of an RA that uses reload correctly? Dummy doesn't
>>> count ;-)
>> Revalidation by the RA authors (and adding "reloadable" attribute to
>> metadata parameters) ;-)
>>> Does anyone object to the (backward-incompatible) solution proposed here?
>> I commented inline.
More information about the Users