[ClusterLabs] Stopping all nodes causes servers to migrate
lists at alteeve.ca
Tue Jan 26 11:03:57 EST 2021
On 2021-01-26 10:15 a.m., Tomas Jelinek wrote:
> Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a):
>> On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais wrote:
>>> Hi Digimer,
>>> On Sun, 24 Jan 2021 15:31:22 -0500
>>> Digimer <lists at alteeve.ca> wrote:
>>>> I had a test server (srv01-test) running on node 1 (el8-a01n01),
>>>> and on
>>>> node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
>>>> It appears like pacemaker asked the VM to migrate to node 2
>>>> instead of
>>>> stopping it. Once the server was on node 2, I couldn't use 'pcs
>>>> disable <vm>' as it returned that that resource was unmanaged, and
>>>> cluster shut down was hung. When I directly stopped the VM and then
>>>> a 'pcs resource cleanup', the cluster shutdown completed.
>>> As actions during a cluster shutdown cannot be handled in the same
>>> for each nodes, I usually add a step to disable all resources using
>>> "stop-all-resources" before shutting down the cluster:
>>> pcs property set stop-all-resources=true
>>> pcs cluster stop --all
>>> But it seems there's a very new cluster property to handle that
>>> (IIRC, one or
>>> two releases ago). Look at "shutdown-lock" doc:
>>> some users prefer to make resources highly available only for
>>> failures, with
>>> no recovery for clean shutdowns. If this option is true, resources
>>> active on a
>>> node when it is cleanly shut down are kept "locked" to that node
>>> (not allowed
>>> to run elsewhere) until they start again on that node after it
>>> rejoins (or
>>> for at most shutdown-lock-limit, if set).
>>>> So as best as I can tell, pacemaker really did ask for a
>>>> migration. Is
>>>> this the case?
>>> AFAIK, yes, because each cluster shutdown request is handled
>>> independently at
>>> node level. There's a large door open for all kind of race conditions
>>> requests are handled with some random lags on each nodes.
>> I'm going to guess that's what happened.
>> The basic issue is that there is no "cluster shutdown" in Pacemaker,
>> only "node shutdown". I'm guessing "pcs cluster stop --all" sends
>> shutdown requests for each node in sequence (probably via systemd), and
>> if the nodes are quick enough, one could start migrating off resources
>> before all the others get their shutdown request.
> Pcs is doing its best to stop nodes in parallel. The first
> implementation of this was done back in 2015:
> Since then, we moved to using curl for network communication, which also
> handles parallel cluster stop. Obviously, this doesn't ensure the stop
> command arrives to and is processed on all nodes at the exactly same time.
> Basically, pcs sends 'stop pacemaker' request to all nodes in parallel
> and waits for it to finish on all nodes. Then it sends 'stop corosync'
> request to all nodes in parallel. The actual stopping on each node is
> done by 'systemctl stop'.
> Yes, the nodes which get the request sooner may start migrating resources.
Given the case I had, where a resource went unmanaged and the stop hung
indefinitely, would that be considered a bug?
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
More information about the Users