[ClusterLabs] Doing reload right

Wed Jul 20 18:13:51 UTC 2016

On 07/20/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot <kgaillot at redhat.com> wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
> 
> [snipped]
> 
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case), when the
> system wants to perform a configuration run on that node (e.g. when
> updating a service's configuration file from a template), it is
> necessary to place the entire node in maintenance mode before
> reloading or restarting that service on that node.  It works OK, but
> can result in ugly effects such as the node getting stuck in
> maintenance mode if the chef-client run failed, without any easy way
> to track down the original cause.
> 
> I went through several design iterations before settling on this
> approach, and they are detailed in a lengthy comment here, which may
> help you better understand the challenges we encountered:
> 
>   https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Wow, that is a lot of hard-earned wisdom. :-)

I don't think the problem is restarting individual clone instances. You
can already restart an individual clone instance, by unmanaging the
resource and disabling any monitors on it, then using crm_resource
--force-* on the desired node.

The problem (for your use case) is that is-managed is cluster-wide for
the given resource. I suspect coming up with a per-node
interface/implementation for is-managed would be difficult.

If we implement --force-reload, there won't be a problem with reloads,
since unmanaging shouldn't be necessary.

FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13.

> Similar challenges are posed during upgrade of Pacemaker-managed
> OpenStack infrastructure.
> 
> Cheers,
> Adam
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>