[ClusterLabs] Stopping all nodes causes servers to migrate

Tue Jan 26 11:27:39 EST 2021

On Tue, 2021-01-26 at 11:03 -0500, Digimer wrote:
> On 2021-01-26 10:15 a.m., Tomas Jelinek wrote:
> > Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a):
> > > On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais
> > > wrote:
> > > > Hi Digimer,
> > > > 
> > > > On Sun, 24 Jan 2021 15:31:22 -0500
> > > > Digimer <lists at alteeve.ca> wrote:
> > > > [...]
> > > > >   I had a test server (srv01-test) running on node 1 (el8-
> > > > > a01n01),
> > > > > and on
> > > > > node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
> > > > > 
> > > > >    It appears like pacemaker asked the VM to migrate to node
> > > > > 2
> > > > > instead of
> > > > > stopping it. Once the server was on node 2, I couldn't use
> > > > > 'pcs
> > > > > resource
> > > > > disable <vm>' as it returned that that resource was
> > > > > unmanaged, and
> > > > > the
> > > > > cluster shut down was hung. When I directly stopped the VM
> > > > > and then
> > > > > did
> > > > > a 'pcs resource cleanup', the cluster shutdown completed.
> > > > 
> > > > As actions during a cluster shutdown cannot be handled in the
> > > > same
> > > > transition
> > > > for each nodes, I usually add a step to disable all resources
> > > > using
> > > > property
> > > > "stop-all-resources" before shutting down the cluster:
> > > > 
> > > >    pcs property set stop-all-resources=true
> > > >    pcs cluster stop --all
> > > > 
> > > > But it seems there's a very new cluster property to handle that
> > > > (IIRC, one or
> > > > two releases ago). Look at "shutdown-lock" doc:
> > > > 
> > > >    [...]
> > > >    some users prefer to make resources highly available only
> > > > for
> > > > failures, with
> > > >    no recovery for clean shutdowns. If this option is true,
> > > > resources
> > > > active on a
> > > >    node when it is cleanly shut down are kept "locked" to that
> > > > node
> > > > (not allowed
> > > >    to run elsewhere) until they start again on that node after
> > > > it
> > > > rejoins (or
> > > >    for at most shutdown-lock-limit, if set).
> > > >    [...]
> > > > 
> > > > [...]
> > > > >    So as best as I can tell, pacemaker really did ask for a
> > > > > migration. Is
> > > > > this the case?
> > > > 
> > > > AFAIK, yes, because each cluster shutdown request is handled
> > > > independently at
> > > > node level. There's a large door open for all kind of race
> > > > conditions
> > > > if
> > > > requests are handled with some random lags on each nodes.
> > > 
> > > I'm going to guess that's what happened.
> > > 
> > > The basic issue is that there is no "cluster shutdown" in
> > > Pacemaker,
> > > only "node shutdown". I'm guessing "pcs cluster stop --all" sends
> > > shutdown requests for each node in sequence (probably via
> > > systemd), and
> > > if the nodes are quick enough, one could start migrating off
> > > resources
> > > before all the others get their shutdown request.
> > 
> > Pcs is doing its best to stop nodes in parallel. The first
> > implementation of this was done back in 2015:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1180506
> > Since then, we moved to using curl for network communication, which
> > also
> > handles parallel cluster stop. Obviously, this doesn't ensure the
> > stop
> > command arrives to and is processed on all nodes at the exactly
> > same time.
> > 
> > Basically, pcs sends 'stop pacemaker' request to all nodes in
> > parallel
> > and waits for it to finish on all nodes. Then it sends 'stop
> > corosync'
> > request to all nodes in parallel. The actual stopping on each node
> > is
> > done by 'systemctl stop'.
> > 
> > Yes, the nodes which get the request sooner may start migrating
> > resources.
> > 
> > Regards,
> > Tomas
> 
> Given the case I had, where a resource went unmanaged and the stop
> hung
> indefinitely, would that be considered a bug?

That depends on why. You'll have to check the logs around that time to
see if there are any details. It would be considered appropriate if
e.g. an action with on-fail=block failed.
-- 
Ken Gaillot <kgaillot at redhat.com>