[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

Thu Feb 27 12:00:36 EST 2020

On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais wrote:
> On Thu, 27 Feb 2020 09:48:23 -0600
> Ken Gaillot <kgaillot at redhat.com> wrote:
> 
> > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> > >   
> > > > > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am
> > > > > > > 27.02.2020 um    
> > > > 
> > > > 11:05 in
> > > > Nachricht <20200227110502.3624cb87 at firost>:
> > > > 
> > > > [...]  
> > > > > What about something like "lock‑location=bool" and    
> > > > 
> > > > For "lock-location" I would assume the value is a "location". I
> > > > guess you
> > > > wanted a "use-lock-location" Boolean value.  
> > > 
> > > Mh, maybe "lock-current-location" would better reflect what I
> > > meant.
> > > 
> > > The point is to lock the resource on the node currently running
> > > it.  
> > 
> > Though it only applies for a clean node shutdown, so that has to be
> > in
> > the name somewhere. The resource isn't locked during normal cluster
> > operation (it can move for resource or node failures, load
> > rebalancing,
> > etc.).
> 
> Well, I was trying to make the new feature a bit wider than just the
> narrow shutdown feature.
> 
> Speaking about shutdown, what is the status of clean shutdown of the
> cluster
> handled by Pacemaker? Currently, I advice to stop resources
> gracefully (eg.
> using pcs resource disable [...]) before shutting down each nodes
> either by hand
> or using some higher level tool (eg. pcs cluster stop --all).

I'm not sure why that would be necessary. It should be perfectly fine
to stop pacemaker in any order without disabling resources.

Start-up is actually more of an issue ... if you start corosync and
pacemaker on nodes one by one, and you're not quick enough, then once
quorum is reached, the cluster will fence all the nodes that haven't
yet come up. So on start-up, it makes sense to start corosync on all
nodes, which will establish membership and quorum, then start pacemaker
on all nodes. Obviously that can't be done within pacemaker so that has
to be done manually or by a higher-level tool.

> Shouldn't this feature be discussed in this context as well?
> 
> [...] 
> > > > > it would lock the resource location (unique or clones) until
> > > > > the
> > > > > operator unlock it or the "lock‑location‑timeout" expire. No
> > > > > matter what
> > > > > happen to the resource, maintenance mode or not.
> > > > > 
> > > > > At a first look, it looks to peer nicely with
> > > > > maintenance‑mode
> > > > > and avoid resource migration after node reboot.    
> > 
> > Maintenance mode is useful if you're updating the cluster stack
> > itself
> > -- put in maintenance mode, stop the cluster services (leaving the
> > managed services still running), update the cluster services, start
> > the
> > cluster services again, take out of maintenance mode.
> > 
> > This is useful if you're rebooting the node for a kernel update
> > (for
> > example). Apply the update, reboot the node. The cluster takes care
> > of
> > everything else for you (stop the services before shutting down and
> > do
> > not recover them until the node comes back).
> 
> I'm a bit lost. If resource doesn't move during maintenance mode,
> could you detail a scenario where we should ban it explicitly from
> other node to
> secure its current location when getting out of maintenance? Isn't it

Sorry, I was unclear -- I was contrasting maintenance mode with
shutdown locks.

You wouldn't need a ban with maintenance mode. However maintenance mode
leaves any active resources running. That means the node shouldn't be
rebooted in maintenance mode, because those resources will not be
cleanly stopped.

With shutdown locks, the active resources are cleanly stopped. That
does require a ban of some sort because otherwise the resources will be
recovered on another node.

> excessive
> precaution? Is it just to avoid is to move somewhere else when
> exiting
> maintenance-mode? If the resource has a preferred node, I suppose the
> location
> constraint should take care of this, isn't it?

Having a preferred node doesn't prevent the resource from starting
elsewhere if the preferred node is down (or in standby, or otherwise
ineligible to run the resource). Even a +INFINITY constraint allows
recovery elsewhere if the node is not available. To keep a resource
from being recovered, you have to put a ban (-INFINITY location
constraint) on any nodes that could otherwise run it.

> > > > I wonder: Where is it different from a time-limited "ban"
> > > > (wording
> > > > also exists
> > > > already)? If you ban all resources from running on a specific
> > > > node,
> > > > resources
> > > > would be move away, and when booting the node, resources won't
> > > > come
> > > > back.  
> > 
> > It actually is equivalent to this process:
> > 
> > 1. Determine what resources are active on the node about to be shut
> > down.
> > 2. For each of those resources, configure a ban (location
> > constraint
> > with -INFINITY score) using a rule where node name is not the node
> > being shut down.
> > 3. Apply the updates and reboot the node. The cluster will stop the
> > resources (due to shutdown) and not start them anywhere else (due
> > to
> > the bans).
> 
> In maintenance mode, this would not move either.

The problem with maintenance mode for this scenario is that the reboot
would uncleanly terminate any active resources.

> > 4. Wait for the node to rejoin and the resources to start on it
> > again,
> > then remove all the bans.
> > 
> > The advantage is automation, and in particular the sysadmin
> > applying
> > the updates doesn't need to even know that the host is part of a
> > cluster.
> 
> Could you elaborate? I suppose the operator still need to issue a
> command to
> set the shutdown‑lock before reboot, isn't it?

Ah, no -- this is intended as a permanent cluster configuration
setting, always in effect.

> Moreover, if shutdown‑lock is just a matter of setting ±infinity
> constraint on
> nodes, maybe a higher level tool can take care of this?

In this case, the operator applying the reboot may not even know what
pacemaker is, much less what command to run. The goal is to fully
automate the process so a cluster-aware administrator does not need to
be present.

I did consider a number of alternative approaches, but they all had
problematic corner cases. For a higher-level tool or anything external
to pacemaker, one such corner case is a "time-of-check/time-of-use"
problem -- determining the list of active resources has to be done
separately from configuring the bans, and it's possible the list could
change in the meantime.

> > > This is the standby mode.  
> > 
> > Standby mode will stop all resources on a node, but it doesn't
> > prevent
> > recovery elsewhere.
> 
> Yes, I was just commenting on Ulrich's description (history context
> crop'ed
> here).
-- 
Ken Gaillot <kgaillot at redhat.com>