[ClusterLabs] Stopping all nodes causes servers to migrate

Mon Jan 25 16:15:30 EST 2021

On 2021-01-25 3:58 p.m., Ken Gaillot wrote:
> On Mon, 2021-01-25 at 13:18 -0500, Digimer wrote:
>> On 2021-01-25 11:01 a.m., Ken Gaillot wrote:
>>> On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais
>>> wrote:
>>>> Hi Digimer,
>>>>
>>>> On Sun, 24 Jan 2021 15:31:22 -0500
>>>> Digimer <lists at alteeve.ca> wrote:
>>>> [...]
>>>>>  I had a test server (srv01-test) running on node 1 (el8-
>>>>> a01n01),
>>>>> and on
>>>>> node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
>>>>>
>>>>>   It appears like pacemaker asked the VM to migrate to node 2
>>>>> instead of
>>>>> stopping it. Once the server was on node 2, I couldn't use 'pcs
>>>>> resource
>>>>> disable <vm>' as it returned that that resource was unmanaged,
>>>>> and
>>>>> the
>>>>> cluster shut down was hung. When I directly stopped the VM and
>>>>> then
>>>>> did
>>>>> a 'pcs resource cleanup', the cluster shutdown completed.
>>>>
>>>> As actions during a cluster shutdown cannot be handled in the
>>>> same
>>>> transition
>>>> for each nodes, I usually add a step to disable all resources
>>>> using
>>>> property
>>>> "stop-all-resources" before shutting down the cluster:
>>>>
>>>>   pcs property set stop-all-resources=true
>>>>   pcs cluster stop --all
>>>>
>>>> But it seems there's a very new cluster property to handle that
>>>> (IIRC, one or
>>>> two releases ago). Look at "shutdown-lock" doc:
>>>>
>>>>   [...]
>>>>   some users prefer to make resources highly available only for
>>>> failures, with
>>>>   no recovery for clean shutdowns. If this option is true,
>>>> resources
>>>> active on a
>>>>   node when it is cleanly shut down are kept "locked" to that
>>>> node
>>>> (not allowed
>>>>   to run elsewhere) until they start again on that node after it
>>>> rejoins (or
>>>>   for at most shutdown-lock-limit, if set).
>>>>   [...]
>>>>
>>>> [...]
>>>>>   So as best as I can tell, pacemaker really did ask for a
>>>>> migration. Is
>>>>> this the case?
>>>>
>>>> AFAIK, yes, because each cluster shutdown request is handled
>>>> independently at
>>>> node level. There's a large door open for all kind of race
>>>> conditions
>>>> if
>>>> requests are handled with some random lags on each nodes.
>>>
>>> I'm going to guess that's what happened.
>>>
>>> The basic issue is that there is no "cluster shutdown" in
>>> Pacemaker,
>>> only "node shutdown". I'm guessing "pcs cluster stop --all" sends
>>> shutdown requests for each node in sequence (probably via systemd),
>>> and
>>> if the nodes are quick enough, one could start migrating off
>>> resources
>>> before all the others get their shutdown request.
>>>
>>> There would be a way around it. Normally Pacemaker is shut down via
>>> SIGTERM to pacemakerd (which is what systemctl stop does), but
>>> inside
>>> Pacemaker it's implemented as a special "shutdown" transient node
>>> attribute, set to the epoch timestamp of the request. It would be
>>> possible to set that attribute for all nodes in a copy of the CIB,
>>> then
>>> load that into the live cluster.
>>>
>>> stop-all-resources as suggested would be another way around it (and
>>> would have to be cleared after start-up, which could be a plus or a
>>> minus depending on how much control vs convenience you want).
>>
>> Thanks for your and everyone else's replies!
>>
>> I'm left curious about one part of this though; When the node
>> migrated,
>> the resource was then listed as unmanaged. So the resource was never
>> requested to shutdown and the cluster shutdown on that node then
>> hung.
>>
>> I can understand what's happening that triggered the migration, and I
>> can understand how to prevent it in the future. (Truth be told, the
>> Anvil! already would shut down all servers before calling the
>> pacemaker
>> stop, but I wanted to test possible fault conditions).
>>
>> Is it not a bug that the cluster was unable to stop after the
>> migration?
>>
>> If I understand what's been said in this thread, the host node got a
>> shutdown request so it migrated the resource. Then the peer (new
>> host)
>> would have gotten the shutdown request, should it then have seen the
>> peer was gone and shut the resource down? Why did it enter an
>> unmanaged
>> state?
>>
>> Cheers
> 
> There aren't many ways the cluster can change a resource to unmanaged:
> maintenance mode configured (on the cluster, node, or resource), a
> failure when on-fail=block, being multiply active with multiple-
> active=block, losing quorum with no-quorum-policy=freeze, or a stop
> failure with no ability to fence.

Sorry, let me clarify; The resource was managed when I called 'pcs
cluster stop --all', so something in the background set it to
'unmanaged'. I suppose I would need to look in the logs...

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould