[ClusterLabs] Stopping a server failed and fenced, despite disabling stop timeout

Tue Jan 19 15:23:16 EST 2021

On 2021-01-19 4:57 a.m., Tomas Jelinek wrote:
> Dne 18. 01. 21 v 20:08 Digimer napsal(a):
>> On 2021-01-18 4:49 a.m., Tomas Jelinek wrote:
>>> Hi Digimer,
>>>
>>> Regarding pcs behavior:
>>>
>>> When deleting a resource, pcs first sets its target-role to Stopped,
>>> pushes the change into pacemaker and waits for the resource to stop.
>>> Once the resource stops, pcs removes the resource from CIB. If pcs
>>> simply removed the resource from CIB without stopping it first, the
>>> resource would be running as orphaned (until pacemaker stops it if
>>> configured to do so). We want to avoid that.
>>>
>>> If the resource cannot be stopped for whatever reason, pcs reports this
>>> and advises running the delete command with --force. Running 'pcs
>>> resource delete --force' skips the part where pcs sets target role and
>>> waits for the resource to stop, making pcs simply remove the resource
>>> from CIB.
>>>
>>> I agree that pcs should handle deleting unmanaged resources in a better
>>> way. We plan to address that, but it's not on top of the priority list.
>>> Our plan is actually to prevent deleting unmanaged resources (or require
>>> --force to be specified to do so) based on the following scenario:
>>>
>>> If a resource is deleted while in unmanaged state, it ends up in
>>> ORPHANED state - it is removed from CIB but still present in running
>>> configuration. This can cause various issues, i.e. when unmanaged
>>> resource is stopped manually outside of the cluster there might be
>>> problems with stopping the resource upon deletion (while unmanaged)
>>> which may end up with stonith being initiated - this is not desired.
>>>
>>>
>>> Regards,
>>> Tomas
>>
>> This logic makes sense. If I may propose a reason for an alternative
>> method;
>>
>> In my case, the idea I was experimenting with was to remove a running
>> server from cluster management, without actually shutting down the
>> server. This is somewhat contrived, I freely admin, but the idea of
>> taking a server out of the config entirely without shutting it down
>> could be useful in some cases.
>>
>> In my case, I didn't worry about the orphaned state and the risk of it
>> trying to start elsewhere as there are additional safeguards in place to
>> prevent this (both in our software and in that DRBD is not set to
>> dual-primary, so the VM simply can't start elsewhere while it's running
>> somewhere).
>>
>> Totally understand it's not a priority, but when this is addressed, some
>> special mechanism to say "I know this will leave it orphaned and that's
>> OK" would be nice to have.
> 
> You can do it even now with "pcs resource delete --force". I admit it's
> not the best way and an extra flag (--dont-stop or similar) would be
> better. I wrote the idea into our notes so it doesn't get forgotten.
> 
> Tomas

Very much appreciated! Please let me know if/when that happens. :)

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould