[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks

Wed Feb 26 04:33:02 EST 2020

>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 25.02.2020 um 23:30 in
Nachricht
<29058_1582669837_5E55A00B_29058_3341_1_f8e8426d0c2cf098f88fb6330e8a80586f03043a
camel at redhat.com>:
> Hi all,
> 
> We are a couple of months away from starting the release cycle for
> Pacemaker 2.0.4. I'll highlight some new features between now and then.
> 
> First we have shutdown locks. This is a narrow use case that I don't
> expect a lot of interest in, but it helps give pacemaker feature parity
> with proprietary HA systems, which can help users feel more comfortable
> switching to pacemaker and open source.
> 
> The use case is a large organization with few cluster experts and many
> junior system administrators who reboot hosts for OS updates during
> planned maintenance windows, without any knowledge of what the host
> does. The cluster runs services that have a preferred node and take a
> very long time to start.
> 
> In this scenario, pacemaker's default behavior of moving the service to
> a failover node when the node shuts down, and moving it back when the
> node comes back up, results in needless downtime compared to just
> leaving the service down for the few minutes needed for a reboot.
> 
> The goal could be accomplished with existing pacemaker features.
> Maintenance mode wouldn't work because the node is being rebooted. But
> you could figure out what resources are active on the node, and use a
> location constraint with a rule to ban them on all other nodes before
> shutting down. That's a lot of work for something the cluster can
> figure out automatically.
> 
> Pacemaker 2.0.4 will offer a new cluster property, shutdown‑lock,
> defaulting to false to keep the current behavior. If shutdown‑lock is
> set to true, any resources active on a node when it is cleanly shut
> down will be "locked" to the node (kept down rather than recovered
> elsewhere). Once the node comes back up and rejoins the cluster, they
> will be "unlocked" (free to move again if circumstances warrant).

I'm not very happy with the wording: What about a per-resource feature
"tolerate-downtime" that specifies how long this resource may be down without
causing actions from the cluster. I think it would be more useful than some
global setting. Maybe complement that per-resource feature with a per-node
feature using the same name.
I think it's very important to specify and document that mode comparing it to
maintenance mode.

Regards,
Ulrich

> 
> An additional cluster property, shutdown‑lock‑limit, allows you to set
> a timeout for the locks so that if the node doesn't come back within
> that time, the resources are free to be recovered elsewhere. This
> defaults to no limit.
> 
> If you decide while the node is down that you need the resource to be
> recovered, you can manually clear a lock with "crm_resource ‑‑refresh"
> specifying both ‑‑node and ‑‑resource.
> 
> There are some limitations using shutdown locks with Pacemaker Remote
> nodes, so I'd avoid that with the upcoming release, though it is
> possible.
> ‑‑ 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/