[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

Thu Feb 27 17:43:41 EST 2020

On Thu, 2020-02-27 at 20:42 +0200, Strahil Nikolov wrote:
> On February 27, 2020 7:00:36 PM GMT+02:00, Ken Gaillot <
> kgaillot at redhat.com> wrote:
> > On Thu, 2020-02-27 at 17:28 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Thu, 27 Feb 2020 09:48:23 -0600
> > > Ken Gaillot <kgaillot at redhat.com> wrote:
> > > 
> > > > On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais
> > > > wrote:
> > > > > On Thu, 27 Feb 2020 12:24:46 +0100
> > > > > "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> > > > >   
> > > > > > > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb
> > > > > > > > > am
> > > > > > > > > 27.02.2020 um    
> > > > > > 
> > > > > > 11:05 in
> > > > > > Nachricht <20200227110502.3624cb87 at firost>:
> > > > > > 
> > > > > > [...]  
> > > > > > > What about something like "lock‑location=bool" and    
> > > > > > 
> > > > > > For "lock-location" I would assume the value is a
> > > > > > "location". I
> > > > > > guess you
> > > > > > wanted a "use-lock-location" Boolean value.  
> > > > > 
> > > > > Mh, maybe "lock-current-location" would better reflect what I
> > > > > meant.
> > > > > 
> > > > > The point is to lock the resource on the node currently
> > > > > running
> > > > > it.  
> > > > 
> > > > Though it only applies for a clean node shutdown, so that has
> > > > to be
> > > > in
> > > > the name somewhere. The resource isn't locked during normal
> > > > cluster
> > > > operation (it can move for resource or node failures, load
> > > > rebalancing,
> > > > etc.).
> > > 
> > > Well, I was trying to make the new feature a bit wider than just
> > > the
> > > narrow shutdown feature.
> > > 
> > > Speaking about shutdown, what is the status of clean shutdown of
> > > the
> > > cluster
> > > handled by Pacemaker? Currently, I advice to stop resources
> > > gracefully (eg.
> > > using pcs resource disable [...]) before shutting down each nodes
> > > either by hand
> > > or using some higher level tool (eg. pcs cluster stop --all).
> > 
> > I'm not sure why that would be necessary. It should be perfectly
> > fine
> > to stop pacemaker in any order without disabling resources.
> > 
> > Start-up is actually more of an issue ... if you start corosync and
> > pacemaker on nodes one by one, and you're not quick enough, then
> > once
> > quorum is reached, the cluster will fence all the nodes that
> > haven't
> > yet come up. So on start-up, it makes sense to start corosync on
> > all
> > nodes, which will establish membership and quorum, then start
> > pacemaker
> > on all nodes. Obviously that can't be done within pacemaker so that
> > has
> > to be done manually or by a higher-level tool.
> > 
> > > Shouldn't this feature be discussed in this context as well?
> > > 
> > > [...] 
> > > > > > > it would lock the resource location (unique or clones)
> > > > > > > until
> > > > > > > the
> > > > > > > operator unlock it or the "lock‑location‑timeout" expire.
> > > > > > > No
> > > > > > > matter what
> > > > > > > happen to the resource, maintenance mode or not.
> > > > > > > 
> > > > > > > At a first look, it looks to peer nicely with
> > > > > > > maintenance‑mode
> > > > > > > and avoid resource migration after node reboot.    
> > > > 
> > > > Maintenance mode is useful if you're updating the cluster stack
> > > > itself
> > > > -- put in maintenance mode, stop the cluster services (leaving
> > > > the
> > > > managed services still running), update the cluster services,
> > > > start
> > > > the
> > > > cluster services again, take out of maintenance mode.
> > > > 
> > > > This is useful if you're rebooting the node for a kernel update
> > > > (for
> > > > example). Apply the update, reboot the node. The cluster takes
> > > > care
> > > > of
> > > > everything else for you (stop the services before shutting down
> > > > and
> > > > do
> > > > not recover them until the node comes back).
> > > 
> > > I'm a bit lost. If resource doesn't move during maintenance mode,
> > > could you detail a scenario where we should ban it explicitly
> > > from
> > > other node to
> > > secure its current location when getting out of maintenance?
> > > Isn't it
> > 
> > Sorry, I was unclear -- I was contrasting maintenance mode with
> > shutdown locks.
> > 
> > You wouldn't need a ban with maintenance mode. However maintenance
> > mode
> > leaves any active resources running. That means the node shouldn't
> > be
> > rebooted in maintenance mode, because those resources will not be
> > cleanly stopped.
> > 
> > With shutdown locks, the active resources are cleanly stopped. That
> > does require a ban of some sort because otherwise the resources
> > will be
> > recovered on another node.
> > 
> > > excessive
> > > precaution? Is it just to avoid is to move somewhere else when
> > > exiting
> > > maintenance-mode? If the resource has a preferred node, I suppose
> > > the
> > > location
> > > constraint should take care of this, isn't it?
> > 
> > Having a preferred node doesn't prevent the resource from starting
> > elsewhere if the preferred node is down (or in standby, or
> > otherwise
> > ineligible to run the resource). Even a +INFINITY constraint allows
> > recovery elsewhere if the node is not available. To keep a resource
> > from being recovered, you have to put a ban (-INFINITY location
> > constraint) on any nodes that could otherwise run it.
> > 
> > > > > > I wonder: Where is it different from a time-limited "ban"
> > > > > > (wording
> > > > > > also exists
> > > > > > already)? If you ban all resources from running on a
> > > > > > specific
> > > > > > node,
> > > > > > resources
> > > > > > would be move away, and when booting the node, resources
> > > > > > won't
> > > > > > come
> > > > > > back.  
> > > > 
> > > > It actually is equivalent to this process:
> > > > 
> > > > 1. Determine what resources are active on the node about to be
> > > > shut
> > > > down.
> > > > 2. For each of those resources, configure a ban (location
> > > > constraint
> > > > with -INFINITY score) using a rule where node name is not the
> > > > node
> > > > being shut down.
> > > > 3. Apply the updates and reboot the node. The cluster will stop
> > > > the
> > > > resources (due to shutdown) and not start them anywhere else
> > > > (due
> > > > to
> > > > the bans).
> > > 
> > > In maintenance mode, this would not move either.
> > 
> > The problem with maintenance mode for this scenario is that the
> > reboot
> > would uncleanly terminate any active resources.
> > 
> > > > 4. Wait for the node to rejoin and the resources to start on it
> > > > again,
> > > > then remove all the bans.
> > > > 
> > > > The advantage is automation, and in particular the sysadmin
> > > > applying
> > > > the updates doesn't need to even know that the host is part of
> > > > a
> > > > cluster.
> > > 
> > > Could you elaborate? I suppose the operator still need to issue a
> > > command to
> > > set the shutdown‑lock before reboot, isn't it?
> > 
> > Ah, no -- this is intended as a permanent cluster configuration
> > setting, always in effect.
> > 
> > > Moreover, if shutdown‑lock is just a matter of setting ±infinity
> > > constraint on
> > > nodes, maybe a higher level tool can take care of this?
> > 
> > In this case, the operator applying the reboot may not even know
> > what
> > pacemaker is, much less what command to run. The goal is to fully
> > automate the process so a cluster-aware administrator does not need
> > to
> > be present.
> > 
> > I did consider a number of alternative approaches, but they all had
> > problematic corner cases. For a higher-level tool or anything
> > external
> > to pacemaker, one such corner case is a "time-of-check/time-of-use"
> > problem -- determining the list of active resources has to be done
> > separately from configuring the bans, and it's possible the list
> > could
> > change in the meantime.
> > 
> > > > > This is the standby mode.  
> > > > 
> > > > Standby mode will stop all resources on a node, but it doesn't
> > > > prevent
> > > > recovery elsewhere.
> > > 
> > > Yes, I was just commenting on Ulrich's description (history
> > > context
> > > crop'ed
> > > here).
> > 
> > -- 
> > Ken Gaillot <kgaillot at redhat.com>
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> Hi Ken,
> 
> Can you tell me the logic of that feature?
> So far it looks like:
> 1. Mark resources/groups that will be affected by the feature

At this time, there's one cluster-wide setting, so it's all or nothing.

The discussion here lends support to the idea of making it a resource
meta-attribute. Luckily the current design wouldn't interfere with
that, so it could be a future extension.

> 2. Resources/groups  are stopped  (target-role=stopped)
> 3. Node exits the cluster cleanly when no resources are  running any
> more
> 4. The node rejoins the cluster  after  the reboot
> 5. A  positive (on the rebooted node) & negative (ban on the rest of
> the nodes) constraints  are  created for the marked  in step 1
> resources
> 6.  target-role is  set back to started and the resources are back
> and running
> 7. When each resource group (or standalone resource)  is  back online
> -  the mark in step 1  is removed  and any location
> constraints  (cli-ban &  cli-prefer)  are  removed  for the
> resource/group.

Exactly, that's effectively what happens.

The cluster doesn't actually have to touch target-role directly,
because the shutdown and bans ensure the resource will be stopped for
lack of available nodes.

> 
> Yet, if that feature will attract more end users (or even
> enterprises) - I think that it will be positive for the stack.
> 
> Best Regards,
> Strahil Nikolov
-- 
Ken Gaillot <kgaillot at redhat.com>