[ClusterLabs] Antw: [EXT] Re: Stop timeout=INFINITY not working

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Jan 27 02:29:30 EST 2021


>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 26.01.2021 um 16:08 in
Nachricht
<ec010c29d38846eb5f50dc627fd43a1510189f4c.camel at redhat.com>:
> On Tue, 2021‑01‑26 at 02:12 ‑0500, Digimer wrote:
>> Hi all,
>> 
>>   I created a resource with an INFINITE stop timeout;
>> 
>> pcs resource create srv01‑test ocf:alteeve:server name="srv01‑test"
>> meta
>> allow‑migrate="true" target‑role="stopped" op monitor interval="60"
>> start timeout="INFINITY" on‑fail="block" stop timeout="INFINITY"
>> on‑fail="block" migrate_to timeout="INFINITY"
> 
> I hadn't noticed this before, but it looks like INFINITY is not allowed
> in time interval specifications, and there's no log warning about it.
> :‑/

Hi!

I was wondering why someone would set a timeout to something like a day or
more:
To give the operator a chance to investigate and fix problems before the
cluster tries recovery?

Regards,
Ulrich


> 
> Time interval specifications can be an integer number of seconds, an
> ISO 8601 duration, or a number with units (s/m/h/etc.).
> 
> Timeouts are stored in milliseconds as 32‑bit unsigned integers so the
> limit is a bit under 50 days (though I'd keep it well below that).
> 
>>   Then I tried stopping it (on a highly loaded system) and it timed
>> out
>> after just 20 seconds and got flagged as failed;
>> 
>> ====
>> Jan 26 07:06:19 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> notice: High CPU load detected: 3.570000
>> Jan 26 07:06:49 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> notice: High CPU load detected: 3.480000
>> Jan 26 07:07:05 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> notice: State transition S_IDLE ‑> S_POLICY_ENGINE
>> Jan 26 07:07:05 el8‑a01n01.alteeve.ca pacemaker‑schedulerd[1846037]:
>> notice:  * Stop       srv01‑test             (               el8‑
>> a01n01
>> )   due to node availability
>> Jan 26 07:07:05 el8‑a01n01.alteeve.ca pacemaker‑schedulerd[1846037]:
>> notice: Calculated transition 179, saving inputs in
>> /var/lib/pacemaker/pengine/pe‑input‑76.bz2
>> Jan 26 07:07:05 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> notice: Initiating stop operation srv01‑test_stop_0 locally on el8‑
>> a01n01
>> Jan 26 07:07:19 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> notice: High CPU load detected: 3.850000
>> Jan 26 07:07:25 el8‑a01n01.alteeve.ca kernel: drbd srv01‑test: role(
>> Primary ‑> Secondary )
>> Jan 26 07:07:25 el8‑a01n01.alteeve.ca pacemaker‑execd[1846035]:
>> warning: srv01‑test_stop_0 process (PID 2647133) timed out
>> Jan 26 07:07:25 el8‑a01n01.alteeve.ca pacemaker‑execd[1846035]:
>> warning: srv01‑test_stop_0[2647133] timed out after 20000ms
>> Jan 26 07:07:25 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> error: Result of stop operation for srv01‑test on el8‑a01n01: Timed
>> Out
>> Jan 26 07:07:25 el8‑a01n01.alteeve.ca pacemaker‑controld[1846038]:
>> notice: el8‑a01n01‑srv01‑test_stop_0:89 [ The server: [srv01‑test] is
>> indeed running. It will be shut down now.\n ]
>> ====
>> 
>> Did I not configure the stop timeout correctly?
>> 
>> Thanks for any insight.
>> 
> ‑‑ 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list