[ClusterLabs] Antw: Doing reload right

Thu Jul 14 04:00:23 UTC 2016

On Sat, Jul 2, 2016 at 1:26 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
> On 07/01/2016 04:48 AM, Jan Pokorný wrote:
>> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
>>>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 30.06.2016 um 18:58 in Nachricht
>>> <57754F9F.8070908 at redhat.com>:
>>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>>> for a while now, and I think the next release will be a good time, as it
>>>> seems to be coming up more frequently.
>>>>
>>>> In the current implementation, Pacemaker considers a resource parameter
>>>> "reloadable" if the resource agent supports the "reload" action, and the
>>>> agent's metadata marks the parameter with "unique=0". If (only) such
>>>> parameters get changed in the resource's pacemaker configuration,
>>>> pacemaker will call the agent's reload action rather than the
>>>> stop-then-start it usually does for parameter changes.
>>>>
>>>> This is completely broken for two reasons:
>>>
>>> I agree ;-)
>>>
>>>>
>>>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>>>> originally intended (and is widely used by existing resource agents) as
>>>> a hint to UIs to indicate which parameters uniquely determine a resource
>>>> instance. That is, two resource instances should never have the same
>>>> value of a "unique" parameter. For this purpose, it makes perfect sense
>>>> that (for example) the path to a binary command would have unique=0 --
>>>> multiple resource instances could (and likely would) use the same
>>>> binary. However, such a parameter could never be reloadable.
>>>
>>> I tought unique=0 were reloadable (unique=1 were not)...
>
> Correct. By "could never be reloadable", I mean that if someone changes
> the location of the daemon binary, there's no way the agent could change
> that with anything other than a full restart. So using unique=0 to
> indicate reloadable doesn't make sense.
>
>> I see a doubly-distorted picture here:
>> - actually "unique=1" on a RA parameter (together with this RA supporting
>>   "reload") currently leads to reload-on-change
>> - also the provided example shows why reload for "unique=0" is wrong,
>>   but as the opposite applies as of current state, it's not an argument
>>   why something is broken
>>
>> See also:
>> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23
>
> Nope, unique=1 is used for the *restart* list -- the non-reloadable
> parameters.
>
>>>> 2. Every known resource agent that implements a reload action does so
>>>> incorrectly. Pacemaker uses reload for changes in the resource's
>>>> *pacemaker* configuration, while all known RAs use reload for a
>>>> service's native reload capability of its own configuration file. As an
>>>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>>>> action, which will have zero effect on any pacemaker-configured
>>>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>>>> UI sense, and none of those parameters are actually reloadable.
>>
>> (per the last subclause, applicable also, after mentioned inversion, for
>> "unique=1", such as a pid file path, which cannot be reloadable for
>> apparent reason)
>>
>>> Maybe LSB confusion...
>>
>> That's not entirely fair vindication, as when you have to do some
>> extra actions with parameters in LSB-aliased "start" action in the
>> RA, you should do such reflections also for "reload".
>
> I think the point is that "reload" for an LSB init script or systemd
> unit always reloads the native service configuration, so it's natural
> for administrators and developers to think of that when they see "reload".

But also because LSB have no parameters to think about.

>
>>>> My proposed solution is:
>>>>
>>>> * Add a new "reloadable" attribute for resource agent metadata, to
>>>> indicate reloadable parameters. Pacemaker would use this instead of
>>>> "unique".
>>>
>>> No objections if you change the XML metadata version number this time ;-)
>>
>> Good point, but I guess everyone's a bit scared to open this Pandora
>> box as there's so much technical debt connected to that (unifying FA/RA
>> metadata if possible, adding new UI-oriented annotations, pacemaker's
>> silent additions like "private" parameter).
>> I'd imagine an established authority for OCF matters (and maintaing
>> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
>> process inspired by Python PEPs for coordinated development:
>> https://www.python.org/dev/peps/pep-0001/
>>
>> </dreams>
>
> An update to the OCF spec is long overdue. I wouldn't mind those wheels
> starting to turn, but I think this reload change could proceed
> independently (though of course coordinated at the appropriate time).
>
>>>> * Add a new "reload-options" RA action for the ability to reload
>>>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
>>>
>>> Why not "reload-parameters"?
>>
>> That came to my mind as well.  Or not wasting time/space on too many
>> letters, just "reload-params", perhaps.
>>
>> </bike-shedding>
>
> Just being a lazy typist :)
>
> You're right, "parameters" or "params" would be more consistent with
> existing usage. "Instance attributes" is probably the most technically
> correct term. I'll vote for "reload-params"
>
>>>> * Formalize that "reload" means reload the service's own configuration,
>>>> legitimizing the most common existing RA implementations. (Pacemaker
>>>> itself will not use this, but tools such as crm_resource might.)
>>>
>>> Maybe be precise what your "reload-options" is expected to do,
>>> compared to the "reload" action.  I'm still a bit confused. Maybe a
>>> working example...
>>
>> IIUIC, reload-options should first reflect the parameters as when
>> "start" is invoked, then delegate the responsibility to something
>> that triggers as native reload as possible (which was mentioned
>> is commonly [and problematically] implemented directly in current
>> "reload" actions of common RAs).
>
> See how much confusion there is? ;-)
>
> Let's say we have a custom resource agent that manages a daemon. The
> daemon doesn't provide its own pid file, so the resource agent creates
> one to track it.
>
> The pacemaker configuration might be:
>
>   resource frobnicator pidfile=/var/run/frobnicator.pid
>
> And separately, the daemon has its own configuration file,
> /etc/frobinicator.conf, containing "debug: false".
>
> We need two separate reload capabilities:
>
> 1. If the user reconfigures pidfile in the pacemaker configuration, the
> agent needs to handle that.

Not sure thats a great example.
The location of a pidfile arguably shouldn't be reloadable because
while it can create a new one, the old will be left around too (and if
anyone else is looking in the old location etc etc).

>
> 2. If the user reconfigures debug in the daemon configuration, the
> daemon needs to handle that.
>
> Currently, pacemaker uses "reload" for #1, but all known published
> resource agents use "reload" for #2. These two uses need to be
> separated. I'm proposing we keep "reload" for #2, since most RAs are
> already written that way, and add a new "reload-params" for #1.

Disagree. 'reload' should do and was always intended to do both.  No
limitation of scope was ever intended, you're supposed to reload
everything about the resource and if your resource reads a config file
- do that too.

Yes, I screwed up the "this is how you indicate when a reload should
be preformed" part, but defining a new function isn't going to solve
the "people only write half of reload".

Something for consideration... should this be reloadable and if so how?

   resource frobnicator conf_file=/var/lib/frobnicator.conf

>
>>>> * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
>>>> use unique, reloadable, reload, and reload-options properly.
>>>>
>>>> The downside is that this breaks backward compatibility. Any RA that
>>>> actually implements unique and reload so that reload works will lose
>>>> reload capability until it is updated to the new style.
>>>
>>> Maybe there's a solution that is even simpler: Keep the action name,
>>> but don't use "relaod" unless the RA indicated at least one
>>> "reloadable" parameter. Naturally old RAs don't do it.

Which is fine because as we already established, the way they're all
written is broken.

>>> And once an
>>> author touches the RA to fix thing, he/she should do so (not just
>>> adding "reloadable" to the metadata).
>>
>> IMHO, idea worth pursuing
>
> I don't like the idea of varying pacemaker interpretation of one thing
> based on the presence of another.

I don't think thats happening.

We would be changing the feature to work as the RA authors expected
for older RA versions (by never using their reload function because it
doesn't do what Pacemaker requested of it) and as Pacemaker always
expected for RAs updated to the improved metadata format.

Seems win-win to me.

> That could make troubleshooting very
> difficult for someone who's not intimately familiar with the
> particulars. (There may be cases where we already do this, but that
> doesn't mean I like it.)
>
>>>> While we usually go to great lengths to preserve backward compatibility,
>>>> I think it is OK to break it in this case, because most RAs that
>>>> implement reload do so wrongly: some implement it as a service reload, a
>>>> few advertise reload but don't actually implement it, and others map
>>>> reload to start, which might theoretically work in some cases (I'm not
>>>> familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
>>>> typically won't, as the previous service options are not reverted (for
>>>> example, I think Route would incorrectly leave the old route in the old
>>>> table).
>>>>
>>>> So, I think breaking backward compatibility is actually a good thing
>>>> here, since the most reload can do with existing RAs is trigger bad
>>>> behavior.
>>>
>>> See my comment above.
>>>
>>>>
>>>> The opposing view would be that we shouldn't punish any RA writer who
>>>> implemented this correctly. However, there's no solution that preserves
>>>
>>> The software could give a warning if the RA provides "reload", but
>>> does not define any reloadble parameter. So to notify users and
>>> developers that some change is needed.
>
> I wouldn't say that's an error condition. Someone may legitimately
> choose to implement reload from the beginning as a no-op, to make it
> easier to add reloadable parameters in the future. Pacemaker only
> reloads if both the reload action is supported and a parameter is
> reloadable; there's no problem with there being one without the other.
>
> Of course, with the proposed change, we're talking about reload-params here.
>
> If we separate "reloadable" as its own metadata attribute (vs "unique"),
> it might make sense to warn if a parameter is marked "reloadable"
> without a reload-params action being supported (the inverse of your
> suggestion), because that would more indicate an oversight.
>
>>>> backward compatibility with both UI usage of unique and reload usage of
>>>> unique. Plus, the worst that would happen is that the RA would stop
>>>> being reloadable -- not as bad as the current possibilities from
>>>> mis-implemented reload.
>>>
>>> Agree on that.
>>>
>>>>
>>>> My questions are:
>>>>
>>>> Does anyone know of an RA that uses reload correctly? Dummy doesn't
>>>> count ;-)
>>>
>>> Revalidation by the RA authors (and adding "reloadable" attribute to
>>> metadata parameters) ;-)
>>>
>>>>
>>>> Does anyone object to the (backward-incompatible) solution proposed here?
>>>
>>> I commented inline.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org