[ClusterLabs] Stopping all nodes causes servers to migrate

Mon Jan 25 03:51:32 EST 2021

Hi Digimer,

On Sun, 24 Jan 2021 15:31:22 -0500
Digimer <lists at alteeve.ca> wrote:
[...]
>  I had a test server (srv01-test) running on node 1 (el8-a01n01), and on
> node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
> 
>   It appears like pacemaker asked the VM to migrate to node 2 instead of
> stopping it. Once the server was on node 2, I couldn't use 'pcs resource
> disable <vm>' as it returned that that resource was unmanaged, and the
> cluster shut down was hung. When I directly stopped the VM and then did
> a 'pcs resource cleanup', the cluster shutdown completed.

As actions during a cluster shutdown cannot be handled in the same transition
for each nodes, I usually add a step to disable all resources using property
"stop-all-resources" before shutting down the cluster:

  pcs property set stop-all-resources=true
  pcs cluster stop --all

But it seems there's a very new cluster property to handle that (IIRC, one or
two releases ago). Look at "shutdown-lock" doc:

  [...]
  some users prefer to make resources highly available only for failures, with
  no recovery for clean shutdowns. If this option is true, resources active on a
  node when it is cleanly shut down are kept "locked" to that node (not allowed
  to run elsewhere) until they start again on that node after it rejoins (or
  for at most shutdown-lock-limit, if set).
  [...]

[...]
>   So as best as I can tell, pacemaker really did ask for a migration. Is
> this the case?

AFAIK, yes, because each cluster shutdown request is handled independently at
node level. There's a large door open for all kind of race conditions if
requests are handled with some random lags on each nodes.

Regards,
-- 
Jehan-Guillaume de Rorthais
Dalibo