[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Coming in Pacemaker 2.0.4: shutdown locks
Ken Gaillot
kgaillot at redhat.com
Fri Feb 28 10:28:53 EST 2020
On Fri, 2020-02-28 at 09:37 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 27.02.2020 um
> > > > 23:43 in Nachricht
>
> <43512a11c2ddffbabeee11cf4cb509e4e5dc98ca.camel at redhat.com>:
>
> [...]
> >
> > > 2. Resources/groups are stopped (target-role=stopped)
> > > 3. Node exits the cluster cleanly when no resources are running
> > > any
> > > more
> > > 4. The node rejoins the cluster after the reboot
> > > 5. A positive (on the rebooted node) & negative (ban on the rest
> > > of
> > > the nodes) constraints are created for the marked in step 1
> > > resources
> > > 6. target-role is set back to started and the resources are
> > > back
> > > and running
> > > 7. When each resource group (or standalone resource) is back
> > > online
> > > - the mark in step 1 is removed and any location
> > > constraints (cli-ban & cli-prefer) are removed for the
> > > resource/group.
> >
> > Exactly, that's effectively what happens.
>
> May I ask how robust the mechanism will be?
> For example if you do a "resource restart" there are two target
> roles (each made persistent): stopped and started. If the node
> performing the operation is fenced (we had that a few times). The
> resources may remain "stopped" until started manually again.
> I see a similar issue with this mechanism.
Corner cases were carefully considered with this one. If a node is
fenced, its entire CIB status section is cleared, which will include
shutdown locks. I considered alternative implementations under the
hood, and the main advantage of the one chosen is that setting and
clearing the lock are atomic with recording the action results that
cause them. That eliminates a whole lot of possibilities for the type
of problem you mention. Also, there are multiple backstops to clear
locks if anything is fishy, such as if the node is unclean, the
resource somehow started elsewhere while the lock was in effect, a
locked resource is removed from the configuration while it is down,
etc.
The one area I don't consider mature yet is Pacemaker Remote nodes. I'd
recommend using the feature only in a cluster without them. This is due
mainly to a (documented) limitation that manual lock clearing and
shutdown-lock-limit only work if the remote connection is disabled
after stopping the node, which sort of defeats the "hands off" goal.
But also I think using locks with remote nodes requires more testing.
>
> [...]
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list