[ClusterLabs] Doing reload right

Thu Jun 30 12:58:07 EDT 2016

Hello all,

I've been meaning to address the implementation of "reload" in Pacemaker
for a while now, and I think the next release will be a good time, as it
seems to be coming up more frequently.

In the current implementation, Pacemaker considers a resource parameter
"reloadable" if the resource agent supports the "reload" action, and the
agent's metadata marks the parameter with "unique=0". If (only) such
parameters get changed in the resource's pacemaker configuration,
pacemaker will call the agent's reload action rather than the
stop-then-start it usually does for parameter changes.

This is completely broken for two reasons:

1. It relies on "unique=0" to determine reloadability. "unique" was
originally intended (and is widely used by existing resource agents) as
a hint to UIs to indicate which parameters uniquely determine a resource
instance. That is, two resource instances should never have the same
value of a "unique" parameter. For this purpose, it makes perfect sense
that (for example) the path to a binary command would have unique=0 --
multiple resource instances could (and likely would) use the same
binary. However, such a parameter could never be reloadable.

2. Every known resource agent that implements a reload action does so
incorrectly. Pacemaker uses reload for changes in the resource's
*pacemaker* configuration, while all known RAs use reload for a
service's native reload capability of its own configuration file. As an
example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
action, which will have zero effect on any pacemaker-configured
parameters -- and on top of that, the RA uses "unique=0" in its correct
UI sense, and none of those parameters are actually reloadable.

My proposed solution is:

* Add a new "reloadable" attribute for resource agent metadata, to
indicate reloadable parameters. Pacemaker would use this instead of
"unique".

* Add a new "reload-options" RA action for the ability to reload
Pacemaker-configured options. Pacemaker would call this instead if "reload".

* Formalize that "reload" means reload the service's own configuration,
legitimizing the most common existing RA implementations. (Pacemaker
itself will not use this, but tools such as crm_resource might.)

* Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
use unique, reloadable, reload, and reload-options properly.

The downside is that this breaks backward compatibility. Any RA that
actually implements unique and reload so that reload works will lose
reload capability until it is updated to the new style.

While we usually go to great lengths to preserve backward compatibility,
I think it is OK to break it in this case, because most RAs that
implement reload do so wrongly: some implement it as a service reload, a
few advertise reload but don't actually implement it, and others map
reload to start, which might theoretically work in some cases (I'm not
familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
typically won't, as the previous service options are not reverted (for
example, I think Route would incorrectly leave the old route in the old
table).

So, I think breaking backward compatibility is actually a good thing
here, since the most reload can do with existing RAs is trigger bad
behavior.

The opposing view would be that we shouldn't punish any RA writer who
implemented this correctly. However, there's no solution that preserves
backward compatibility with both UI usage of unique and reload usage of
unique. Plus, the worst that would happen is that the RA would stop
being reloadable -- not as bad as the current possibilities from
mis-implemented reload.

My questions are:

Does anyone know of an RA that uses reload correctly? Dummy doesn't
count ;-)

Does anyone object to the (backward-incompatible) solution proposed here?
-- 
Ken Gaillot <kgaillot at redhat.com>