[ClusterLabs] Stopping all nodes causes servers to migrate

Mon Jan 25 06:14:03 EST 2021

On 1/25/21 9:51 AM, Jehan-Guillaume de Rorthais wrote:
> Hi Digimer,
>
> On Sun, 24 Jan 2021 15:31:22 -0500
> Digimer <lists at alteeve.ca> wrote:
> [...]
>>  I had a test server (srv01-test) running on node 1 (el8-a01n01), and on
>> node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
>>
>>   It appears like pacemaker asked the VM to migrate to node 2 instead of
>> stopping it. Once the server was on node 2, I couldn't use 'pcs resource
>> disable <vm>' as it returned that that resource was unmanaged, and the
>> cluster shut down was hung. When I directly stopped the VM and then did
>> a 'pcs resource cleanup', the cluster shutdown completed.
> As actions during a cluster shutdown cannot be handled in the same transition
> for each nodes, I usually add a step to disable all resources using property
> "stop-all-resources" before shutting down the cluster:
>
>   pcs property set stop-all-resources=true
>   pcs cluster stop --all
>
> But it seems there's a very new cluster property to handle that (IIRC, one or
> two releases ago). Look at "shutdown-lock" doc:
>
>   [...]
>   some users prefer to make resources highly available only for failures, with
>   no recovery for clean shutdowns. If this option is true, resources active on a
>   node when it is cleanly shut down are kept "locked" to that node (not allowed
>   to run elsewhere) until they start again on that node after it rejoins (or
>   for at most shutdown-lock-limit, if set).
>   [...]
Intention of that feature definitely isn't to serve any purpose in a
shutdown of the whole cluster. Idea is that you can restart a single
node without the cluster shuffling around resources while that
is happening. Instead the resources running on that node would
just go down and come up again on the same node after the
restart.
>
> [...]
>>   So as best as I can tell, pacemaker really did ask for a migration. Is
>> this the case?
> AFAIK, yes, because each cluster shutdown request is handled independently at
> node level. There's a large door open for all kind of race conditions if
> requests are handled with some random lags on each nodes.
>
>
> Regards,