[ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

Mon Mar 14 13:18:11 EDT 2016

On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:
> I am back to this question =)
> 
> I am still trying to understand the impact of "High CPU load detected"
> messages in the log.
> Looking in the code I figured out that setting "load-threshold" parameter
> to something higher than 100% solves the problem.
> And actually for 8 cores (12 with Hyper Threading) load-threshold=400% kind
> of works.
> 
> Also I noticed that this parameter may have an impact on the number of "the
> maximum number of jobs that can be scheduled per node". As there is a
> formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.
> 
> Is my understanding correct that the impact of setting "load-threshold"
> high enough (so there is no noisy messages) will lead only to the
> "throttle_job_max" and nothing more.
> Also, if I got it correct, than "throttle_job_max" is a number of allowed
> parallel actions per node in lrmd.
> And a child of the lrmd is actually an RA process running some actions
> (monitor, start, etc).
> 
> So there is no impact on how many RA (resources) can run on a node, but how
> Pacemaker will operate with them in parallel (I am not sure I understand
> this part correct).

I believe that is an accurate description. I think the job limit applies
to fence actions as well as lrmd actions.

Note that if /proc/cpuinfo exists, pacemaker will figure out the number
of cores from there, and divide the actual reported load by that number
before comparing against load-threshold.

> Thank you,
> Kostia
> 
> On Wed, Jun 3, 2015 at 12:17 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>>
>>> On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko <
>> konstantin.ponomarenko at gmail.com> wrote:
>>>
>>> I think I wasn't precise in my questions.
>>> So I will try to ask more precise questions.
>>> 1. why the default value for "load-threshold" is 80%?
>>
>> Experimentation showed it better to begin throttling before the node
>> became saturated.
>>
>>> 2. what would be the impact to the cluster in case of
>> "load-threshold=100%”?
>>
>> Your nodes will be busier.  Will they be able to handle your load or will
>> it result in additional recovery actions (creating more load and more
>> failures)?  Only you will know when you try.
>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Mon, May 25, 2015 at 4:11 PM, Kostiantyn Ponomarenko <
>> konstantin.ponomarenko at gmail.com> wrote:
>>> Guys, please, if anyone can help me to understand this parameter better,
>> I would be appreciated.
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, May 22, 2015 at 4:15 PM, Kostiantyn Ponomarenko <
>> konstantin.ponomarenko at gmail.com> wrote:
>>> Another question - is it crmd specific to measure CPU usage by "I/O
>> wait"?
>>> And if I need to get the most performance of the running resources in
>> cluster, should I set "load-threshold=95%" (or even 100%)?
>>> Will it impact the cluster behavior in any ways?
>>> The man page for crmd says that it will "The cluster will slow down its
>> recovery process when the amount of system resources used (currently CPU)
>> approaches this limit".
>>> Does it mean there will be delays in cluster in moving resources in case
>> a node goes down, or something else?
>>> I just want to understand in better.
>>>
>>> That you in advance for the help =)
>>>
>>> P.S.: The main resource does a lot of disk I/Os.
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, May 22, 2015 at 3:30 PM, Kostiantyn Ponomarenko <
>> konstantin.ponomarenko at gmail.com> wrote:
>>> I didn't know that.
>>> You mentioned "as opposed to other Linuxes", but I am using Debian Linux.
>>> Does it also measure CPU usage by I/O waits?
>>> You are right about "I/O waits" (a screenshot of "top" is attached).
>>> But why it shows 50% of CPU usage for a single process (that is the main
>> one) while "I/O waits" shows a bigger number?
>>>
>>>
>>> Thank you,
>>> Kostya
>>>
>>> On Fri, May 22, 2015 at 9:40 AM, Ulrich Windl <
>> Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> schrieb am
>> 22.05.2015 um
>>> 08:36 in Nachricht <555EEA72020000A10001A71D at gwsmtp1.uni-regensburg.de>:
>>>> Hi!
>>>>
>>>> I Linux I/O waits are considered for load (as opposed to other
>> Linuxes) Thus
>>> ^^ "In"
>>                             s/Linux/UNIX/
>>>
>>> (I should have my coffee now to awake ;-) Sorry.