[ClusterLabs] Doing reload right

Fri Jul 22 17:10:30 EDT 2016

On 07/21/2016 07:46 PM, Andrew Beekhof wrote:
> On Fri, Jul 22, 2016 at 1:48 AM, Adam Spiers <aspiers at suse.com> wrote:
>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>> On 07/20/2016 07:32 PM, Andrew Beekhof wrote:
>>>> On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers <aspiers at suse.com> wrote:
>>>>> Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>>>>> for a while now, and I think the next release will be a good time, as it
>>>>>> seems to be coming up more frequently.
>>>>>
>>>>> [snipped]
>>>>>
>>>>> I don't want to comment directly on any of the excellent points which
>>>>> have been raised in this thread, but it seems like a good time to make
>>>>> a plea for easier reload / restart of individual instances of cloned
>>>>> services, one node at a time.  Currently, if nodes are all managed by
>>>>> a configuration management system (such as Chef in our case),
>>>>
>>>> Puppet creates the same kinds of issues.
>>>> Both seem designed for a magical world full of unrelated servers that
>>>> require no co-ordination to update.
>>>> Particularly when the timing of an update to some central store (cib,
>>>> database, whatever) needs to be carefully ordered.
>>>>
>>>> When you say "restart" though, is that a traditional stop/start cycle
>>>> in Pacemaker that also results in all the dependancies being stopped
>>>> too?
>>
>> No, just the service reload or restart without causing any cascading
>> effects in Pacemaker.
>>
>>>> I'm guessing you really want the "atomic reload" kind where nothing
>>>> else is affected because we already have the other style covered by
>>>> crm_resource --restart.
>>>
>>> crm_resource --restart isn't sufficient for his use case because it
>>> affects all clone instances cluster-wide, whereas he needs to reload or
>>> restart (depending on the service) the local instance only.
> 
> Isn't that what I said?  That --restart does a version that he doesn't want?
> 
>> Exactly.
>>
>>>> I propose that we introduce a --force-restart option for crm_resource which:
>>>>
>>>> 1. disables any recurring monitor operations
>>>
>>> None of the other --force-* options disable monitors, so for
>>> consistency, I think we should leave this to the user (or add it for
>>> other --force-*).
> 
> No.  There is no other way to reliably achieve a restart than to
> disable the monitors first so that they don't detect a transient
> state.  Especially if the resource doesn't advertise a restart
> command.

I see your point, --force-{stop,demote,promote} can still complete with
monitors running (even if the cluster reverses it immediately after),
but a stop-start cycle might not even complete before being disrupted.

>>>
>>>> 2. calls a native restart action directly on the resource if it
>>>> exists, otherwise calls the native stop+start actions
>>>
>>> What do you mean by native restart action? Systemd restart?
> 
> Whatever the agent supports.

Are you suggesting that pacemaker starting checking whether the agent
metadata advertises a "restart" action? Or just assume that certain
resource classes support restart (e.g. systemd) and others don't (e.g. ocf)?

>>>
>>>> 3. re-enables the recurring monitor operations regardless of whether
>>>> the reload succeeds, fails, or times out, etc
>>>>
>>>> No maintenance mode required, and whatever state the resource ends up
>>>> in is re-detected by the cluster in step 3.
>>>
>>> If you're lucky :-)
>>>
>>> The cluster may still mess with the resource even without monitors, e.g.
>>> a dependency fails or a preferred node comes online.
> 
> Can you explain how neither of those results in a restart of the service?

Unless the resource is unmanaged, the cluster could do something like
move it to a different node, disrupting the local force-restart.

Ideally, we'd be able to disable monitors and unmanage the resource for
the duration of the force-restart, but only on the local node.

>>> Maintenance
>>> mode/unmanaging would still be safer (though no --force-* option is
>>> completely safe, besides check).
>>
>> I'm happy with whatever you gurus come up with ;-)  I'm just hoping
>> that it can be made possible to pinpoint an individual resource on an
>> individual node, rather than having to toggle maintenance flag(s)
>> across a whole set of clones, or a whole node.
> 
> Yep.