[ClusterLabs] Doing reload right

Thu Jul 21 11:48:04 EDT 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> On 07/20/2016 07:32 PM, Andrew Beekhof wrote:
> > On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers <aspiers at suse.com> wrote:
> >> Ken Gaillot <kgaillot at redhat.com> wrote:
> >>> Hello all,
> >>>
> >>> I've been meaning to address the implementation of "reload" in Pacemaker
> >>> for a while now, and I think the next release will be a good time, as it
> >>> seems to be coming up more frequently.
> >>
> >> [snipped]
> >>
> >> I don't want to comment directly on any of the excellent points which
> >> have been raised in this thread, but it seems like a good time to make
> >> a plea for easier reload / restart of individual instances of cloned
> >> services, one node at a time.  Currently, if nodes are all managed by
> >> a configuration management system (such as Chef in our case),
> > 
> > Puppet creates the same kinds of issues.
> > Both seem designed for a magical world full of unrelated servers that
> > require no co-ordination to update.
> > Particularly when the timing of an update to some central store (cib,
> > database, whatever) needs to be carefully ordered.
> > 
> > When you say "restart" though, is that a traditional stop/start cycle
> > in Pacemaker that also results in all the dependancies being stopped
> > too?

No, just the service reload or restart without causing any cascading
effects in Pacemaker.

> > I'm guessing you really want the "atomic reload" kind where nothing
> > else is affected because we already have the other style covered by
> > crm_resource --restart.
> 
> crm_resource --restart isn't sufficient for his use case because it
> affects all clone instances cluster-wide, whereas he needs to reload or
> restart (depending on the service) the local instance only.

Exactly.

> > I propose that we introduce a --force-restart option for crm_resource which:
> > 
> > 1. disables any recurring monitor operations
> 
> None of the other --force-* options disable monitors, so for
> consistency, I think we should leave this to the user (or add it for
> other --force-*).
>
> > 2. calls a native restart action directly on the resource if it
> > exists, otherwise calls the native stop+start actions
> 
> What do you mean by native restart action? Systemd restart?
> 
> > 3. re-enables the recurring monitor operations regardless of whether
> > the reload succeeds, fails, or times out, etc
> > 
> > No maintenance mode required, and whatever state the resource ends up
> > in is re-detected by the cluster in step 3.
> 
> If you're lucky :-)
> 
> The cluster may still mess with the resource even without monitors, e.g.
> a dependency fails or a preferred node comes online. Maintenance
> mode/unmanaging would still be safer (though no --force-* option is
> completely safe, besides check).

I'm happy with whatever you gurus come up with ;-)  I'm just hoping
that it can be made possible to pinpoint an individual resource on an
individual node, rather than having to toggle maintenance flag(s)
across a whole set of clones, or a whole node.