[ClusterLabs] Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

Thu Feb 27 09:01:50 EST 2020

On Thu, 27 Feb 2020 12:24:46 +0100
"Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> wrote:

> >>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> schrieb am 27.02.2020 um  
> 11:05 in
> Nachricht <20200227110502.3624cb87 at firost>:
> 
> [...]
> > What about something like "lock‑location=bool" and  
> 
> For "lock-location" I would assume the value is a "location". I guess you
> wanted a "use-lock-location" Boolean value.

Mh, maybe "lock-current-location" would better reflect what I meant.

The point is to lock the resource on the node currently running it.

> > "lock‑location‑timeout=duration" (for those who like automatic steps)? I 
> > imagine  
> 
> I'm still unhappy with "lock-location": What is a "location", and what does it
> mean to be "locked"?
> Is that fundamentally different from "freeze/frozen" or "ignore" (all those
> phrases exist already)?

A "location" define where a resource is located in the cluster, on what node.
Eg., a location constraint express where a ressource //can// run:

  «Location constraints tell the cluster which nodes a resource can run on. »
  https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html

Here, "constraints" applies to a location. So, if you remove this constraint,
the natural definition location would be:

  «Location tell the cluster what node a resource is running on.»

> > it would lock the resource location (unique or clones) until the operator
> > unlock it or the "lock‑location‑timeout" expire. No matter what happen to  
> > the resource, maintenance mode or not.
> > 
> > At a first look, it looks to peer nicely with maintenance‑mode and avoid
> > resource migration after node reboot.  
> 
> I wonder: Where is it different from a time-limited "ban" (wording also exists
> already)? If you ban all resources from running on a specific node, resources
> would be move away, and when booting the node, resources won't come back.

This is the standby mode.

Moreover, note that Ken explicitly wrote: «The cluster runs services that have
a preferred node». So if the resource moved elsewhere, the resource **must**
come back.

> But you want the resources to be down while the node boots, right? How can
> that concept be "married with" the concept of high availablility?

The point here is to avoid moving resources during planed maintenance/downtime
as it would require longer maintenance duration (thus longer downtime) than a
simple reboot with no resource migration.

Even a resource in HA can have planed maintenance :)

> "We have a HA cluster and HA resources, but when we boot a node those
> HA-resources will be down while the node boots." How is that different from
> not having a HA cluster, or taking those resources temporarily away from the
> HA cluster? (That was my intitial objection: Why not simply ignore resource
> failures for some time?)

Unless I'm wrong, maintenance mode does not secure the current location of
resources after reboots.