[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks
kgaillot at redhat.com
Wed Feb 26 10:41:00 EST 2020
On Wed, 2020-02-26 at 10:33 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 25.02.2020 um
> > > > 23:30 in
> camel at redhat.com>:
> > Hi all,
> > We are a couple of months away from starting the release cycle for
> > Pacemaker 2.0.4. I'll highlight some new features between now and
> > then.
> > First we have shutdown locks. This is a narrow use case that I
> > don't
> > expect a lot of interest in, but it helps give pacemaker feature
> > parity
> > with proprietary HA systems, which can help users feel more
> > comfortable
> > switching to pacemaker and open source.
> > The use case is a large organization with few cluster experts and
> > many
> > junior system administrators who reboot hosts for OS updates during
> > planned maintenance windows, without any knowledge of what the host
> > does. The cluster runs services that have a preferred node and take
> > a
> > very long time to start.
> > In this scenario, pacemaker's default behavior of moving the
> > service to
> > a failover node when the node shuts down, and moving it back when
> > the
> > node comes back up, results in needless downtime compared to just
> > leaving the service down for the few minutes needed for a reboot.
> > The goal could be accomplished with existing pacemaker features.
> > Maintenance mode wouldn't work because the node is being rebooted.
> > But
> > you could figure out what resources are active on the node, and use
> > a
> > location constraint with a rule to ban them on all other nodes
> > before
> > shutting down. That's a lot of work for something the cluster can
> > figure out automatically.
> > Pacemaker 2.0.4 will offer a new cluster property, shutdown‑lock,
> > defaulting to false to keep the current behavior. If shutdown‑lock
> > is
> > set to true, any resources active on a node when it is cleanly shut
> > down will be "locked" to the node (kept down rather than recovered
> > elsewhere). Once the node comes back up and rejoins the cluster,
> > they
> > will be "unlocked" (free to move again if circumstances warrant).
> I'm not very happy with the wording: What about a per-resource
> "tolerate-downtime" that specifies how long this resource may be down
> causing actions from the cluster. I think it would be more useful
> than some
> global setting. Maybe complement that per-resource feature with a
> feature using the same name.
I considered a per-resource and/or per-node setting, but the target
audience is someone who wants things as simple as possible. A per-node
setting would mean that newly added nodes don't have it by default,
which could be easily overlooked. (As an aside, I would someday like to
see a "node defaults" section that would provide default values for
node attributes. That could potentially replace several current
cluster-wide options. But it's a low priority.)
I didn't mention this in the announcements, but certain resource types
Stonith resources and Pacemaker Remote connection resources are never
locked. That makes sense because they are more a sort of internal
pseudo-resource than an actual end-user service. Stonith resources are
just monitors of the fence device, and a connection resource starts a
(remote) node rather than a service.
Also, with the current implementation, clone and bundle instances are
not locked. This would only matter for unique clones, and
clones/bundles with clone-max/replicas set below the total number of
nodes. If this becomes a high demand, we could add it in the future.
Similarly for the master role of promotable clones.
Given those limitations, I think a per-resource option would have more
potential to be confusing than helpful. But, it should be relatively
simple to extend this as a per-resource option, with the global option
as a backward-compatible default, if the demand arises.
> I think it's very important to specify and document that mode
> comparing it to
> maintenance mode.
The proposed documentation is in the master branch if you want to proof
it and make suggestions. If you have the prerequisites installed you
can run "make -C doc" and view it locally, otherwise you can browse the
source (search for "shutdown-lock"):
There is currently no explicit comparison with maintenance-mode because
maintenance-mode still behaves according to its documention ("Should
the cluster refrain from monitoring, starting and stopping
However I can see the value in adding a section somewhere (probably in
"Pacemaker Administration") comparing all the various "don't touch"
settings -- maintenance-mode, maintenance node/resource attributes,
standby, is-managed, shutdown-lock, and the monitor enable option. The
current "Monitoring Resources When Administration is Disabled" section
in Pacemaker Explained could be a good starting point for this. Another
item for the to-do list ...
> > An additional cluster property, shutdown‑lock‑limit, allows you to
> > set
> > a timeout for the locks so that if the node doesn't come back
> > within
> > that time, the resources are free to be recovered elsewhere. This
> > defaults to no limit.
> > If you decide while the node is down that you need the resource to
> > be
> > recovered, you can manually clear a lock with "crm_resource
> > ‑‑refresh"
> > specifying both ‑‑node and ‑‑resource.
> > There are some limitations using shutdown locks with Pacemaker
> > Remote
> > nodes, so I'd avoid that with the upcoming release, though it is
> > possible.
> > ‑‑
> > Ken Gaillot <kgaillot at redhat.com>
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > ClusterLabs home: https://www.clusterlabs.org/
> Manage your subscription:
> ClusterLabs home: https://www.clusterlabs.org/
Ken Gaillot <kgaillot at redhat.com>
More information about the Users