[ClusterLabs] Doing reload right

Fri Jul 22 00:46:21 UTC 2016

On Fri, Jul 22, 2016 at 1:48 AM, Adam Spiers <aspiers at suse.com> wrote:
> Ken Gaillot <kgaillot at redhat.com> wrote:
>> On 07/20/2016 07:32 PM, Andrew Beekhof wrote:
>> > On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers <aspiers at suse.com> wrote:
>> >> Ken Gaillot <kgaillot at redhat.com> wrote:
>> >>> Hello all,
>> >>>
>> >>> I've been meaning to address the implementation of "reload" in Pacemaker
>> >>> for a while now, and I think the next release will be a good time, as it
>> >>> seems to be coming up more frequently.
>> >>
>> >> [snipped]
>> >>
>> >> I don't want to comment directly on any of the excellent points which
>> >> have been raised in this thread, but it seems like a good time to make
>> >> a plea for easier reload / restart of individual instances of cloned
>> >> services, one node at a time.  Currently, if nodes are all managed by
>> >> a configuration management system (such as Chef in our case),
>> >
>> > Puppet creates the same kinds of issues.
>> > Both seem designed for a magical world full of unrelated servers that
>> > require no co-ordination to update.
>> > Particularly when the timing of an update to some central store (cib,
>> > database, whatever) needs to be carefully ordered.
>> >
>> > When you say "restart" though, is that a traditional stop/start cycle
>> > in Pacemaker that also results in all the dependancies being stopped
>> > too?
>
> No, just the service reload or restart without causing any cascading
> effects in Pacemaker.
>
>> > I'm guessing you really want the "atomic reload" kind where nothing
>> > else is affected because we already have the other style covered by
>> > crm_resource --restart.
>>
>> crm_resource --restart isn't sufficient for his use case because it
>> affects all clone instances cluster-wide, whereas he needs to reload or
>> restart (depending on the service) the local instance only.

Isn't that what I said?  That --restart does a version that he doesn't want?

> Exactly.
>
>> > I propose that we introduce a --force-restart option for crm_resource which:
>> >
>> > 1. disables any recurring monitor operations
>>
>> None of the other --force-* options disable monitors, so for
>> consistency, I think we should leave this to the user (or add it for
>> other --force-*).

No.  There is no other way to reliably achieve a restart than to
disable the monitors first so that they don't detect a transient
state.  Especially if the resource doesn't advertise a restart
command.

>>
>> > 2. calls a native restart action directly on the resource if it
>> > exists, otherwise calls the native stop+start actions
>>
>> What do you mean by native restart action? Systemd restart?

Whatever the agent supports.

>>
>> > 3. re-enables the recurring monitor operations regardless of whether
>> > the reload succeeds, fails, or times out, etc
>> >
>> > No maintenance mode required, and whatever state the resource ends up
>> > in is re-detected by the cluster in step 3.
>>
>> If you're lucky :-)
>>
>> The cluster may still mess with the resource even without monitors, e.g.
>> a dependency fails or a preferred node comes online.

Can you explain how neither of those results in a restart of the service?

>> Maintenance
>> mode/unmanaging would still be safer (though no --force-* option is
>> completely safe, besides check).
>
> I'm happy with whatever you gurus come up with ;-)  I'm just hoping
> that it can be made possible to pinpoint an individual resource on an
> individual node, rather than having to toggle maintenance flag(s)
> across a whole set of clones, or a whole node.

Yep.

>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org