[ClusterLabs] Stop timeout=INFINITY not working

Ken Gaillot kgaillot at redhat.com
Tue Jan 26 10:08:01 EST 2021


On Tue, 2021-01-26 at 02:12 -0500, Digimer wrote:
> Hi all,
> 
>   I created a resource with an INFINITE stop timeout;
> 
> pcs resource create srv01-test ocf:alteeve:server name="srv01-test"
> meta
> allow-migrate="true" target-role="stopped" op monitor interval="60"
> start timeout="INFINITY" on-fail="block" stop timeout="INFINITY"
> on-fail="block" migrate_to timeout="INFINITY"

I hadn't noticed this before, but it looks like INFINITY is not allowed
in time interval specifications, and there's no log warning about it.
:-/

Time interval specifications can be an integer number of seconds, an
ISO 8601 duration, or a number with units (s/m/h/etc.).

Timeouts are stored in milliseconds as 32-bit unsigned integers so the
limit is a bit under 50 days (though I'd keep it well below that).

>   Then I tried stopping it (on a highly loaded system) and it timed
> out
> after just 20 seconds and got flagged as failed;
> 
> ====
> Jan 26 07:06:19 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> notice: High CPU load detected: 3.570000
> Jan 26 07:06:49 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> notice: High CPU load detected: 3.480000
> Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> notice: State transition S_IDLE -> S_POLICY_ENGINE
> Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-schedulerd[1846037]:
> notice:  * Stop       srv01-test             (               el8-
> a01n01
> )   due to node availability
> Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-schedulerd[1846037]:
> notice: Calculated transition 179, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-76.bz2
> Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> notice: Initiating stop operation srv01-test_stop_0 locally on el8-
> a01n01
> Jan 26 07:07:19 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> notice: High CPU load detected: 3.850000
> Jan 26 07:07:25 el8-a01n01.alteeve.ca kernel: drbd srv01-test: role(
> Primary -> Secondary )
> Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-execd[1846035]:
> warning: srv01-test_stop_0 process (PID 2647133) timed out
> Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-execd[1846035]:
> warning: srv01-test_stop_0[2647133] timed out after 20000ms
> Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> error: Result of stop operation for srv01-test on el8-a01n01: Timed
> Out
> Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
> notice: el8-a01n01-srv01-test_stop_0:89 [ The server: [srv01-test] is
> indeed running. It will be shut down now.\n ]
> ====
> 
> Did I not configure the stop timeout correctly?
> 
> Thanks for any insight.
> 
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list