[ClusterLabs] Antw: Re: Antw: Doing reload right

Fri Jul 15 19:34:43 EDT 2016

On 07/14/2016 06:21 PM, Andrew Beekhof wrote:
> On Fri, Jul 15, 2016 at 2:33 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
>> On 07/13/2016 11:20 PM, Andrew Beekhof wrote:
>>> On Wed, Jul 6, 2016 at 12:57 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>>> On 07/04/2016 02:01 AM, Ulrich Windl wrote:
>>>>> For the case of changing the contents of an external configuration file, the
>>>>> RA would have to provide some reloadable dummy parameter then (maybe like
>>>>> "config_generation=2").
>>>>
>>>> That is a widely recommended approach for the current "reload"
>>>> implementation, but I don't think it's desirable. It still does not
>>>> distinguish changes in the Pacemaker resource configuration from changes
>>>> in the service configuration.
>>>>
>>>> For example, of an RA has one parameter that is agent-reloadable and
>>>> another that is service-reloadable, and it gets a "reload" action, it
>>>> has no way of knowing which of the two (or both) changed. It would have
>>>> to always reload all agent-reloadable parameters, and trigger a service
>>>> reload. That seems inefficient to me. Also, only Pacemaker should
>>>> trigger agent reloads, and only the user should trigger service reloads,
>>>> so combining them doesn't make sense to me.
>>>
>>> Totally disagree :-)
>>>
>>> The whole reason service reloads exist is that they are more efficient
>>> than a stop/start cycle.
>>>
>>> So I'm not seeing how calling one, on the rare occasion that the
>>> parameters change and allow a reload, when it wasn't necessary can be
>>> classed as inefficient.   On the contrary, trying to avoid it seems
>>> like over-optimizing when we should be aiming for correctness - ie.
>>> reloading the whole thing.
>>
>> I just don't see any logical connection between modifying a service's
>> Pacemaker configuration and modifying its service configuration file.
> 
> There isn't one beyond they are both bypassing a stop/start cycle.
> 
>>
>> Is the idea that people will tend to change them together?
> 
> No, the idea is that the "penalty" of making sure both are up-to-date,
> in the rare event that either one is changed, does not justify
> splitting them up.

OK. In that case, we'd keep the "reload" action for doing both types of
reload together, and the only change we need to consider is unique vs
reloadable.

Thinking it through some more, I'm leaning to this approach:

1. Let's buckle down and update the OCF spec to reflect the accrued
real-world practices, as well as this change. This will allow resource
agents to specify that they comply with the new terminology by setting
<version> to 1.1, and both pacemaker and higher-level tools can rely on
that to determine whether to use the new behavior.

The alternative is that pacemaker and higher-level tools could check
whether a resource agent specifies "reloadable" for any parameter, and
use the new behavior if so. It's doable, but it's another hacky
workaround when we're really overdue for this anyway.

2. Since the current usage of "unique" is so broken, I think we should
abandon it altogether, and use two new attribute names to indicate
uniqueness and reloadability. We've already converged on "reloadable",
so we just need something to indicate that two instances of a resource
cannot share the same value of a given parameter. Maybe "reject_duplicate"?

I think it might even be worthwhile for pacemaker (not just high-level
tools) to enforce the new attribute, because it would be used to
indicate that there's a problem if it's used twice. For example, you can
start two instances of apache with different config files, but you don't
want to try to start two instances with the same config file. We can't
do that currently because unique is often set wrong, but if we create a
new attribute, we can enforce it from the get-go.

If we don't come up with a new name, I think "unique" becomes completely
unusable -- resource agents couldn't rely on pacemaker or high-level
tools to interpret it consistently, and high-level tools couldn't rely
on resource agents to specify it properly.

3. If a resource agent specifies OCF 1.1 or greater, Pacemaker can look
for reloadability and uniqueness; otherwise, it would never reload or
enforce uniqueness. And, we can add a crm_resource --force-reload option
to do a reload without needing to change a dummy attribute.

The above would let resource agents confidently specify metadata that
can be used with any version of pacemaker or high-level tools. They
could specify OCF 1.1 and the new attribute names, which would be used
by newer pacemaker and ignored by older pacemaker, and if desired, they
could even (continue to) specify unique=0 to indicate reloadability to
older pacemaker.

>> I'd expect
>> that in most environments, the Pacemaker configuration (e.g. where the
>> apache config file is) will remain much more stable than the service
>> configuration (e.g. adding/modifying websites).
>>
>> Service reloads can sometimes be expensive (e.g. a complex/busy postfix
>> or apache installation) even if they are less expensive than a full restart.
> 
> Right. But you just said that the pacemaker config is much less likely
> (out of a thing thats already not very likely) to change. So why are
> you optimizing for that scenario?
> 
>>
>>> The most in-efficient part in all this is the current practice of
>>> updating a dummy attribute to trigger a reload after changing the
>>> application config file.  That we can address by supporting
>>> --force-reload for crm_resource like we do for start/stop/monitor (and
>>> exposing it nicely in pcs).