[ClusterLabs] Doing reload right

Fri Jul 1 15:33:56 UTC 2016

On 07/01/2016 03:48 AM, Klaus Wenninger wrote:
> On 06/30/2016 06:58 PM, Ken Gaillot wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
>>
>> In the current implementation, Pacemaker considers a resource parameter
>> "reloadable" if the resource agent supports the "reload" action, and the
>> agent's metadata marks the parameter with "unique=0". If (only) such
>> parameters get changed in the resource's pacemaker configuration,
>> pacemaker will call the agent's reload action rather than the
>> stop-then-start it usually does for parameter changes.
>>
>> This is completely broken for two reasons:
>>
>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>> originally intended (and is widely used by existing resource agents) as
>> a hint to UIs to indicate which parameters uniquely determine a resource
>> instance. That is, two resource instances should never have the same
>> value of a "unique" parameter. For this purpose, it makes perfect sense
>> that (for example) the path to a binary command would have unique=0 --
>> multiple resource instances could (and likely would) use the same
>> binary. However, such a parameter could never be reloadable.
>>
>> 2. Every known resource agent that implements a reload action does so
>> incorrectly. Pacemaker uses reload for changes in the resource's
>> *pacemaker* configuration, while all known RAs use reload for a
>> service's native reload capability of its own configuration file. As an
>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>> action, which will have zero effect on any pacemaker-configured
>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>> UI sense, and none of those parameters are actually reloadable.
>>
>> My proposed solution is:
>>
>> * Add a new "reloadable" attribute for resource agent metadata, to
>> indicate reloadable parameters. Pacemaker would use this instead of
>> "unique".
>>
>> * Add a new "reload-options" RA action for the ability to reload
>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
>>
>> * Formalize that "reload" means reload the service's own configuration,
>> legitimizing the most common existing RA implementations. (Pacemaker
>> itself will not use this, but tools such as crm_resource might.)
>>
>> * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
>> use unique, reloadable, reload, and reload-options properly.
>>
>> The downside is that this breaks backward compatibility. Any RA that
>> actually implements unique and reload so that reload works will lose
>> reload capability until it is updated to the new style.
>>
>> While we usually go to great lengths to preserve backward compatibility,
>> I think it is OK to break it in this case, because most RAs that
>> implement reload do so wrongly: some implement it as a service reload, a
>> few advertise reload but don't actually implement it, and others map
>> reload to start, which might theoretically work in some cases (I'm not
>> familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
>> typically won't, as the previous service options are not reverted (for
>> example, I think Route would incorrectly leave the old route in the old
>> table).
>>
>> So, I think breaking backward compatibility is actually a good thing
>> here, since the most reload can do with existing RAs is trigger bad
>> behavior.
>>
>> The opposing view would be that we shouldn't punish any RA writer who
>> implemented this correctly. However, there's no solution that preserves
>> backward compatibility with both UI usage of unique and reload usage of
>> unique. Plus, the worst that would happen is that the RA would stop
>> being reloadable -- not as bad as the current possibilities from
>> mis-implemented reload.
>>
>> My questions are:
>>
>> Does anyone know of an RA that uses reload correctly? Dummy doesn't
>> count ;-)
> 
> Guess I've done that correctly in the past. (taking the parameters in
> the RA and passing them to a running daemon via DBus e.g.)
> 'Correctly' at least as much as possible because I remember I
> had to remove the enforcement of the 2nd meaning of unique
> from crmsh.
> Just nothing of it was useful in a generic way so it is not hosted
> publicly - although available on request of course.
> Just mentioning that because I could imagine that other RAs
> like that might exist, that just never made it to the public
> while still useful in the environment they were made for.

Agreed, custom agents are my main concern.

However, I'm still inclined to break compatibility for these reasons:

* All public RAs that I've looked at implement "reload" incorrectly, and
those are in much wider use.

* Even custom RAs that implement reload correctly will run into the
"unique" problem.

* Breaking compatibility simply means the parameters are no longer
reloadable, whereas maintaining compatibility means that the currently
broken RAs have the chance of seriously buggy behavior that could break
services.

> One way to reward implementers of the 'correct way'
> might be to keep the old behavior in cases where
> reload is implemented and no parameter has reloadable set.
> A big warning in the logs would probably still be advisable.

As mentioned elsewhere, I'm wary of varying behavior in such a way,
because it could make troubleshooting more difficult. Agent behavior
could vary depending on changes in the resource configuration, and the
effects of a stable resource configuration could vary depending on (not
obviously relevant) changes in the agent.

>>
>> Does anyone object to the (backward-incompatible) solution proposed here?