[ClusterLabs] Stopping all nodes causes servers to migrate

Mon Jan 25 15:58:03 EST 2021

On Mon, 2021-01-25 at 13:18 -0500, Digimer wrote:
> On 2021-01-25 11:01 a.m., Ken Gaillot wrote:
> > On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > Hi Digimer,
> > > 
> > > On Sun, 24 Jan 2021 15:31:22 -0500
> > > Digimer <lists at alteeve.ca> wrote:
> > > [...]
> > > >  I had a test server (srv01-test) running on node 1 (el8-
> > > > a01n01),
> > > > and on
> > > > node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
> > > > 
> > > >   It appears like pacemaker asked the VM to migrate to node 2
> > > > instead of
> > > > stopping it. Once the server was on node 2, I couldn't use 'pcs
> > > > resource
> > > > disable <vm>' as it returned that that resource was unmanaged,
> > > > and
> > > > the
> > > > cluster shut down was hung. When I directly stopped the VM and
> > > > then
> > > > did
> > > > a 'pcs resource cleanup', the cluster shutdown completed.
> > > 
> > > As actions during a cluster shutdown cannot be handled in the
> > > same
> > > transition
> > > for each nodes, I usually add a step to disable all resources
> > > using
> > > property
> > > "stop-all-resources" before shutting down the cluster:
> > > 
> > >   pcs property set stop-all-resources=true
> > >   pcs cluster stop --all
> > > 
> > > But it seems there's a very new cluster property to handle that
> > > (IIRC, one or
> > > two releases ago). Look at "shutdown-lock" doc:
> > > 
> > >   [...]
> > >   some users prefer to make resources highly available only for
> > > failures, with
> > >   no recovery for clean shutdowns. If this option is true,
> > > resources
> > > active on a
> > >   node when it is cleanly shut down are kept "locked" to that
> > > node
> > > (not allowed
> > >   to run elsewhere) until they start again on that node after it
> > > rejoins (or
> > >   for at most shutdown-lock-limit, if set).
> > >   [...]
> > > 
> > > [...]
> > > >   So as best as I can tell, pacemaker really did ask for a
> > > > migration. Is
> > > > this the case?
> > > 
> > > AFAIK, yes, because each cluster shutdown request is handled
> > > independently at
> > > node level. There's a large door open for all kind of race
> > > conditions
> > > if
> > > requests are handled with some random lags on each nodes.
> > 
> > I'm going to guess that's what happened.
> > 
> > The basic issue is that there is no "cluster shutdown" in
> > Pacemaker,
> > only "node shutdown". I'm guessing "pcs cluster stop --all" sends
> > shutdown requests for each node in sequence (probably via systemd),
> > and
> > if the nodes are quick enough, one could start migrating off
> > resources
> > before all the others get their shutdown request.
> > 
> > There would be a way around it. Normally Pacemaker is shut down via
> > SIGTERM to pacemakerd (which is what systemctl stop does), but
> > inside
> > Pacemaker it's implemented as a special "shutdown" transient node
> > attribute, set to the epoch timestamp of the request. It would be
> > possible to set that attribute for all nodes in a copy of the CIB,
> > then
> > load that into the live cluster.
> > 
> > stop-all-resources as suggested would be another way around it (and
> > would have to be cleared after start-up, which could be a plus or a
> > minus depending on how much control vs convenience you want).
> 
> Thanks for your and everyone else's replies!
> 
> I'm left curious about one part of this though; When the node
> migrated,
> the resource was then listed as unmanaged. So the resource was never
> requested to shutdown and the cluster shutdown on that node then
> hung.
> 
> I can understand what's happening that triggered the migration, and I
> can understand how to prevent it in the future. (Truth be told, the
> Anvil! already would shut down all servers before calling the
> pacemaker
> stop, but I wanted to test possible fault conditions).
> 
> Is it not a bug that the cluster was unable to stop after the
> migration?
> 
> If I understand what's been said in this thread, the host node got a
> shutdown request so it migrated the resource. Then the peer (new
> host)
> would have gotten the shutdown request, should it then have seen the
> peer was gone and shut the resource down? Why did it enter an
> unmanaged
> state?
> 
> Cheers

There aren't many ways the cluster can change a resource to unmanaged:
maintenance mode configured (on the cluster, node, or resource), a
failure when on-fail=block, being multiply active with multiple-
active=block, losing quorum with no-quorum-policy=freeze, or a stop
failure with no ability to fence.
-- 
Ken Gaillot <kgaillot at redhat.com>