[ClusterLabs] Stop timeout=INFINITY not working

Digimer lists at alteeve.ca
Tue Jan 26 02:12:25 EST 2021


Hi all,

  I created a resource with an INFINITE stop timeout;

pcs resource create srv01-test ocf:alteeve:server name="srv01-test" meta
allow-migrate="true" target-role="stopped" op monitor interval="60"
start timeout="INFINITY" on-fail="block" stop timeout="INFINITY"
on-fail="block" migrate_to timeout="INFINITY"

  Then I tried stopping it (on a highly loaded system) and it timed out
after just 20 seconds and got flagged as failed;

====
Jan 26 07:06:19 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
notice: High CPU load detected: 3.570000
Jan 26 07:06:49 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
notice: High CPU load detected: 3.480000
Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
notice: State transition S_IDLE -> S_POLICY_ENGINE
Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-schedulerd[1846037]:
notice:  * Stop       srv01-test             (               el8-a01n01
)   due to node availability
Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-schedulerd[1846037]:
notice: Calculated transition 179, saving inputs in
/var/lib/pacemaker/pengine/pe-input-76.bz2
Jan 26 07:07:05 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
notice: Initiating stop operation srv01-test_stop_0 locally on el8-a01n01
Jan 26 07:07:19 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
notice: High CPU load detected: 3.850000
Jan 26 07:07:25 el8-a01n01.alteeve.ca kernel: drbd srv01-test: role(
Primary -> Secondary )
Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-execd[1846035]:
warning: srv01-test_stop_0 process (PID 2647133) timed out
Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-execd[1846035]:
warning: srv01-test_stop_0[2647133] timed out after 20000ms
Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
error: Result of stop operation for srv01-test on el8-a01n01: Timed Out
Jan 26 07:07:25 el8-a01n01.alteeve.ca pacemaker-controld[1846038]:
notice: el8-a01n01-srv01-test_stop_0:89 [ The server: [srv01-test] is
indeed running. It will be shut down now.\n ]
====

Did I not configure the stop timeout correctly?

Thanks for any insight.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould


More information about the Users mailing list