[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

Thu Feb 27 10:48:23 EST 2020

On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais wrote:
> On Thu, 27 Feb 2020 12:24:46 +0100
> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
> > > > > Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am
> > > > > 27.02.2020 um  
> > 
> > 11:05 in
> > Nachricht <20200227110502.3624cb87 at firost>:
> > 
> > [...]
> > > What about something like "lock‑location=bool" and  
> > 
> > For "lock-location" I would assume the value is a "location". I
> > guess you
> > wanted a "use-lock-location" Boolean value.
> 
> Mh, maybe "lock-current-location" would better reflect what I meant.
> 
> The point is to lock the resource on the node currently running it.

Though it only applies for a clean node shutdown, so that has to be in
the name somewhere. The resource isn't locked during normal cluster
operation (it can move for resource or node failures, load rebalancing,
etc.).

> > > "lock‑location‑timeout=duration" (for those who like automatic
> > > steps)? I 
> > > imagine  
> > 
> > I'm still unhappy with "lock-location": What is a "location", and
> > what does it
> > mean to be "locked"?
> > Is that fundamentally different from "freeze/frozen" or "ignore"
> > (all those
> > phrases exist already)?
> 
> A "location" define where a resource is located in the cluster, on
> what node.
> Eg., a location constraint express where a ressource //can// run:
> 
>   «Location constraints tell the cluster which nodes a resource can
> run on. »
>   
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html
> 
> Here, "constraints" applies to a location. So, if you remove this
> constraint,
> the natural definition location would be:
> 
>   «Location tell the cluster what node a resource is running on.»
> 
> > > it would lock the resource location (unique or clones) until the
> > > operator
> > > unlock it or the "lock‑location‑timeout" expire. No matter what
> > > happen to  
> > > the resource, maintenance mode or not.
> > > 
> > > At a first look, it looks to peer nicely with maintenance‑mode
> > > and avoid
> > > resource migration after node reboot.  

Maintenance mode is useful if you're updating the cluster stack itself
-- put in maintenance mode, stop the cluster services (leaving the
managed services still running), update the cluster services, start the
cluster services again, take out of maintenance mode.

This is useful if you're rebooting the node for a kernel update (for
example). Apply the update, reboot the node. The cluster takes care of
everything else for you (stop the services before shutting down and do
not recover them until the node comes back).

> > I wonder: Where is it different from a time-limited "ban" (wording
> > also exists
> > already)? If you ban all resources from running on a specific node,
> > resources
> > would be move away, and when booting the node, resources won't come
> > back.

It actually is equivalent to this process:

1. Determine what resources are active on the node about to be shut
down.
2. For each of those resources, configure a ban (location constraint
with -INFINITY score) using a rule where node name is not the node
being shut down.
3. Apply the updates and reboot the node. The cluster will stop the
resources (due to shutdown) and not start them anywhere else (due to
the bans).
4. Wait for the node to rejoin and the resources to start on it again,
then remove all the bans.

The advantage is automation, and in particular the sysadmin applying
the updates doesn't need to even know that the host is part of a
cluster.

> This is the standby mode.

Standby mode will stop all resources on a node, but it doesn't prevent
recovery elsewhere.

> Moreover, note that Ken explicitly wrote: «The cluster runs services
> that have
> a preferred node». So if the resource moved elsewhere, the resource
> **must**
> come back.

Right, the point of preventing recovery elsewhere is to avoid the extra
outage:

Without shutdown lock:
1. When node is stopped, resource stops on that node, and starts on
another node. (First outage)
2. When node rejoins, resource stops on the alternate node, and starts
on original node. (Second outage)

With shutdown lock, there's one outage when the node is rebooted, but
then it starts on the same node so there is no second outage. If the
resource start time is much longer (e.g. a half hour for an extremely
large database) than the reboot time (a couple of minutes), the feature
becomes worthwhile.

> > But you want the resources to be down while the node boots, right?
> > How can
> > that concept be "married with" the concept of high availablility?
> 
> The point here is to avoid moving resources during planed
> maintenance/downtime
> as it would require longer maintenance duration (thus longer
> downtime) than a
> simple reboot with no resource migration.
> 
> Even a resource in HA can have planed maintenance :)

Right. I jokingly call this feature "medium availability" but really it
is just another way to set a planned maintenance window.

> > "We have a HA cluster and HA resources, but when we boot a node
> > those
> > HA-resources will be down while the node boots." How is that
> > different from
> > not having a HA cluster, or taking those resources temporarily away
> > from the
> > HA cluster? (That was my intitial objection: Why not simply ignore
> > resource
> > failures for some time?)

HA recovery is still done for resource failures and node failures, just
not clean node shutdowns. A clean node shutdown is one where the node
notifies the DC that it wants to leave the cluster (which is what
happens in the background when you stop cluster services on a node).

Also, all other cluster resource management features being used, like
utilization attributes, placement strategies, node health attributes,
time-based rules, etc., are all still in effect.

> Unless I'm wrong, maintenance mode does not secure the current
> location of
> resources after reboots.
-- 
Ken Gaillot <kgaillot at redhat.com>