[ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

Wed Mar 30 20:10:40 UTC 2016

On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:
> Ken, thank you for the answer.
> 
> Every node in my cluster under normal conditions has "load average" of
> about 420. It is mainly connected to the high disk IO on the system.
> My system is designed to use almost 100% of its hardware (CPU/RAM/disks),
> so the situation when the system consumes almost all HW resources is
> normal.

420 suggests that HW resources are outstripped -- anything above the
system's number of cores means processes are waiting for some resource.
(Although with an I/O-bound workload like this, the number of cores
isn't very important -- most will be sitting idle despite the high
load.) And if that's during normal conditions, what will happen during a
usage spike? It sounds like a recipe for less-than-HA.

Under high load, there's a risk of negative feedback, where monitors
time out, causing pacemaker to schedule recovery actions, which cause
load to go higher and more monitors to time out, etc. That's why
throttling is there.

> I would like to get rid of "High CPU load detected" messages in the
> log, because
> they flood corosync.log as well as system journal.
> 
> Maybe you can give an advice what would be the best way do to it?
> 
> So far I came up with the idea of setting "load-threshold" to 1000% ,
> because of:
>     420(load average) / 24 (cores) = 17.5 (adjusted_load);
>     2 (THROTLE_FACTOR_HIGH) * 10 (throttle_load_target) = 20
> 
>     if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>         crm_notice("High %s detected: %f", desc, load);

That should work, as far as reducing the log messages, though of course
it also reduces the amount of throttling pacemaker will do.

> In this case do I need to set "node-action-limit" to something less than "2
> x cores" (which is default).

It's not necessary, but it would help compensate for the reduced
throttling by imposing a maximum number of actions run at one time.

I usually wouldn't recommend reducing log verbosity, because detailed
logs are often necessary for troubleshooting cluster issues, but if your
logs are on the same I/O controller that is overloaded, you might
consider logging only to syslog and not to an additional detail file.
That would cut back on the amount of I/O due to pacemaker itself. You
could even drop PCMK_logpriority to warning, but then you're losing even
more information.

> Because the logic is (crmd/throttle.c):
> 
>     switch(r->mode) {
>         case throttle_extreme:
>         case throttle_high:
>             jobs = 1; /* At least one job must always be allowed */
>             break;
>         case throttle_med:
>             jobs = QB_MAX(1, r->max / 4);
>             break;
>         case throttle_low:
>             jobs = QB_MAX(1, r->max / 2);
>             break;
>         case throttle_none:
>             jobs = QB_MAX(1, r->max);
>             break;
>         default:
>             crm_err("Unknown throttle mode %.4x on %s", r->mode, node);
>             break;
>     }
>     return jobs;
> 
> 
> The thing is, I know that there is "High CPU load" and this is normal
> state, but I wont Pacemaker to not saying it to me and treat this state the
> best it can.

If you can't improve your I/O performance, what you suggested is
probably the best that can be done.

When I/O is that critical to you, there are many tweaks that can make a
big difference in performance. I'm not sure how familiar you are with
them already. Options depend on what your storage is (local or network,
hardware/software/no RAID, etc.) and what your I/O-bound application is
(database, etc.), but I'd look closely at cache/buffer settings at all
levels from hardware to application, RAID stripe alignment, filesystem
choice and tuning, log verbosity, etc.

> 
> Thank you,
> Kostia
> 
> On Mon, Mar 14, 2016 at 7:18 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
> 
>> On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:
>>> I am back to this question =)
>>>
>>> I am still trying to understand the impact of "High CPU load detected"
>>> messages in the log.
>>> Looking in the code I figured out that setting "load-threshold" parameter
>>> to something higher than 100% solves the problem.
>>> And actually for 8 cores (12 with Hyper Threading) load-threshold=400%
>> kind
>>> of works.
>>>
>>> Also I noticed that this parameter may have an impact on the number of
>> "the
>>> maximum number of jobs that can be scheduled per node". As there is a
>>> formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.
>>>
>>> Is my understanding correct that the impact of setting "load-threshold"
>>> high enough (so there is no noisy messages) will lead only to the
>>> "throttle_job_max" and nothing more.
>>> Also, if I got it correct, than "throttle_job_max" is a number of allowed
>>> parallel actions per node in lrmd.
>>> And a child of the lrmd is actually an RA process running some actions
>>> (monitor, start, etc).
>>>
>>> So there is no impact on how many RA (resources) can run on a node, but
>> how
>>> Pacemaker will operate with them in parallel (I am not sure I understand
>>> this part correct).
>>
>> I believe that is an accurate description. I think the job limit applies
>> to fence actions as well as lrmd actions.
>>
>> Note that if /proc/cpuinfo exists, pacemaker will figure out the number
>> of cores from there, and divide the actual reported load by that number
>> before comparing against load-threshold.
>>
>>> Thank you,
>>> Kostia
>>>
>>> On Wed, Jun 3, 2015 at 12:17 AM, Andrew Beekhof <andrew at beekhof.net>
>> wrote:
>>>
>>>>
>>>>> On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko <
>>>> konstantin.ponomarenko at gmail.com> wrote:
>>>>>
>>>>> I think I wasn't precise in my questions.
>>>>> So I will try to ask more precise questions.
>>>>> 1. why the default value for "load-threshold" is 80%?
>>>>
>>>> Experimentation showed it better to begin throttling before the node
>>>> became saturated.
>>>>
>>>>> 2. what would be the impact to the cluster in case of
>>>> "load-threshold=100%”?
>>>>
>>>> Your nodes will be busier.  Will they be able to handle your load or
>> will
>>>> it result in additional recovery actions (creating more load and more
>>>> failures)?  Only you will know when you try.
>>>>
>>>>>
>>>>> Thank you,
>>>>> Kostya
>>>>>
>>>>> On Mon, May 25, 2015 at 4:11 PM, Kostiantyn Ponomarenko <
>>>> konstantin.ponomarenko at gmail.com> wrote:
>>>>> Guys, please, if anyone can help me to understand this parameter
>> better,
>>>> I would be appreciated.
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Kostya
>>>>>
>>>>> On Fri, May 22, 2015 at 4:15 PM, Kostiantyn Ponomarenko <
>>>> konstantin.ponomarenko at gmail.com> wrote:
>>>>> Another question - is it crmd specific to measure CPU usage by "I/O
>>>> wait"?
>>>>> And if I need to get the most performance of the running resources in
>>>> cluster, should I set "load-threshold=95%" (or even 100%)?
>>>>> Will it impact the cluster behavior in any ways?
>>>>> The man page for crmd says that it will "The cluster will slow down its
>>>> recovery process when the amount of system resources used (currently
>> CPU)
>>>> approaches this limit".
>>>>> Does it mean there will be delays in cluster in moving resources in
>> case
>>>> a node goes down, or something else?
>>>>> I just want to understand in better.
>>>>>
>>>>> That you in advance for the help =)
>>>>>
>>>>> P.S.: The main resource does a lot of disk I/Os.
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Kostya
>>>>>
>>>>> On Fri, May 22, 2015 at 3:30 PM, Kostiantyn Ponomarenko <
>>>> konstantin.ponomarenko at gmail.com> wrote:
>>>>> I didn't know that.
>>>>> You mentioned "as opposed to other Linuxes", but I am using Debian
>> Linux.
>>>>> Does it also measure CPU usage by I/O waits?
>>>>> You are right about "I/O waits" (a screenshot of "top" is attached).
>>>>> But why it shows 50% of CPU usage for a single process (that is the
>> main
>>>> one) while "I/O waits" shows a bigger number?
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Kostya
>>>>>
>>>>> On Fri, May 22, 2015 at 9:40 AM, Ulrich Windl <
>>>> Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>>>>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de> schrieb am
>>>> 22.05.2015 um
>>>>> 08:36 in Nachricht <555EEA72020000A10001A71D at gwsmtp1.uni-regensburg.de
>>> :
>>>>>> Hi!
>>>>>>
>>>>>> I Linux I/O waits are considered for load (as opposed to other
>>>> Linuxes) Thus
>>>>> ^^ "In"
>>>>                             s/Linux/UNIX/
>>>>>
>>>>> (I should have my coffee now to awake ;-) Sorry.
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>