[ClusterLabs] Antw: [EXT] Re: Stop timeout=INFINITY not working

Wed Jan 27 18:10:44 EST 2021

On 2021-01-27 2:29 a.m., Ulrich Windl wrote:
>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 26.01.2021 um 16:08 in
> Nachricht
> <ec010c29d38846eb5f50dc627fd43a1510189f4c.camel at redhat.com>:
>> On Tue, 2021‑01‑26 at 02:12 ‑0500, Digimer wrote:
>>> Hi all,
>>>
>>>   I created a resource with an INFINITE stop timeout;
>>>
>>> pcs resource create srv01‑test ocf:alteeve:server name="srv01‑test"
>>> meta
>>> allow‑migrate="true" target‑role="stopped" op monitor interval="60"
>>> start timeout="INFINITY" on‑fail="block" stop timeout="INFINITY"
>>> on‑fail="block" migrate_to timeout="INFINITY"
>>
>> I hadn't noticed this before, but it looks like INFINITY is not allowed
>> in time interval specifications, and there's no log warning about it.
>> :‑/
> 
> Hi!
> 
> I was wondering why someone would set a timeout to something like a day or
> more:
> To give the operator a chance to investigate and fix problems before the
> cluster tries recovery?
> 
> Regards,
> Ulrich

Windows.

Microsoft decided a while back that the perfect time to install OS
updates was when a windows server or workstation was told to shut down.
Back in the rgmanager days, this was a problem because rgmanager
terminated a resource that didn't stop in two minutes. So while a
client's windows server VM was saying "Do not power off your computer!",
the cluster pulled the plug.

So we set an INFINITE (well, that needs to change now) so that if this
happened, the cluster would keep waiting. It's dumb, but it is what it is.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould