[ClusterLabs] Doing reload right

Thu Jul 21 15:44:36 UTC 2016

Ken Gaillot <kgaillot at redhat.com> wrote:
> On 07/20/2016 11:47 AM, Adam Spiers wrote:
> > Ken Gaillot <kgaillot at redhat.com> wrote:
> >> Hello all,
> >>
> >> I've been meaning to address the implementation of "reload" in Pacemaker
> >> for a while now, and I think the next release will be a good time, as it
> >> seems to be coming up more frequently.
> > 
> > [snipped]
> > 
> > I don't want to comment directly on any of the excellent points which
> > have been raised in this thread, but it seems like a good time to make
> > a plea for easier reload / restart of individual instances of cloned
> > services, one node at a time.  Currently, if nodes are all managed by
> > a configuration management system (such as Chef in our case), when the
> > system wants to perform a configuration run on that node (e.g. when
> > updating a service's configuration file from a template), it is
> > necessary to place the entire node in maintenance mode before
> > reloading or restarting that service on that node.  It works OK, but
> > can result in ugly effects such as the node getting stuck in
> > maintenance mode if the chef-client run failed, without any easy way
> > to track down the original cause.
> > 
> > I went through several design iterations before settling on this
> > approach, and they are detailed in a lengthy comment here, which may
> > help you better understand the challenges we encountered:
> > 
> >   https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61
> 
> Wow, that is a lot of hard-earned wisdom. :-)

Yep 8-/

> I don't think the problem is restarting individual clone instances. You
> can already restart an individual clone instance, by unmanaging the
> resource and disabling any monitors on it, then using crm_resource
> --force-* on the desired node.
> 
> The problem (for your use case) is that is-managed is cluster-wide for
> the given resource.

Exactly.

> I suspect coming up with a per-node
> interface/implementation for is-managed would be difficult.
> 
> If we implement --force-reload, there won't be a problem with reloads,
> since unmanaging shouldn't be necessary.

OK, sounds good :)

> FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13.

Yep, we're relying on it already!