[ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

Mon May 8 17:56:00 EDT 2017

On 05/05/2017 12:37 AM, Jitendra.Jagasia at dell.com wrote:
>  
> 
> Hello All,
> 
>  
> 
> Sorry for resurrecting old thread.
> 
>  
> 
> I am also observing “High CPU load detected" messages in the logs
> 
>  
> 
> In this email chain, I see everyone is suggesting to change
> "load-threshold" settings
> 
>  
> 
> But I am not able to find any good information about “load-threshold”
> except this https://www.mankier.com/7/crmd
> 
>  
> 
> Even in Pacemaker document
> “http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/pdf/Pacemaker_Explained/Pacemaker-1.1-Pacemaker_Explained-en-US.pdf”
> 
>  
> 
> There is not much detail about “load-threshold”.
> 
>  
> 
> Please can someone share steps or any commands to modify “load-threshold”.
> 
>  
> 
> Thanks
> 
> Jitendra

Hi Jitendra,

Those messages indicate there is a real issue with the CPU load. When
the cluster notices high load, it reduces the number of actions it will
execute at the same time. This is generally a good idea, to avoid making
the load worse.

The messages don't hurt anything, they just let you know that there is
something worth investigating.

If you've investigated the load and it's not something to be concerned
about, you can change load-threshold to adjust what the cluster
considers "high". The load-threshold works like this:

* It defaults to 0.8 (which means pacemaker should try to avoid
consuming more than 80% of the system's resources).

* On a single-core machine, load-threshold is multiplied by 0.6 (because
with only one core you *really* don't want to consume too many
resources); on a multi-core machine, load-threshold is multiplied by the
number of cores (to normalize the system load per core).

* That number is then multiplied by 1.2 to get the "Noticeable CPU load
detected" message (debug level), by 1.6 to get the "Moderate CPU load"
message, and 2.0 to get the "High CPU load" message. These are measured
against the 1-minute system load average (the same number you would get
with top, uptime, etc.).

So, if you raise load-threshold above 0.8, you won't see the log
messages until the load gets even higher. But, that doesn't do anything
about the actual load problem.

> *From:*Kostiantyn Ponomarenko [mailto:konstantin.ponomarenko at gmail.com]
> *Sent:* Tuesday, April 5, 2016 8:37 AM
> *To:* kgaillot at redhat.com
> *Cc:* Cluster Labs - All topics related to open-source clustering
> welcomed <users at clusterlabs.org>
> *Subject:* Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load:
> High CPU load detected
> 
>  
> 
> Thank you, Ken.
> 
> This helps a lot.
> 
> Now I am sure that my current approach fits best for me =)
> 
> 
> Thank you,
> 
> Kostia
> 
>  
> 
> On Wed, Mar 30, 2016 at 11:10 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
> 
>     On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:
>     > Ken, thank you for the answer.
>     >
>     > Every node in my cluster under normal conditions has "load average" of
>     > about 420. It is mainly connected to the high disk IO on the system.
>     > My system is designed to use almost 100% of its hardware
>     (CPU/RAM/disks),
>     > so the situation when the system consumes almost all HW resources is
>     > normal.
> 
>     420 suggests that HW resources are outstripped -- anything above the
>     system's number of cores means processes are waiting for some resource.
>     (Although with an I/O-bound workload like this, the number of cores
>     isn't very important -- most will be sitting idle despite the high
>     load.) And if that's during normal conditions, what will happen during a
>     usage spike? It sounds like a recipe for less-than-HA.
> 
>     Under high load, there's a risk of negative feedback, where monitors
>     time out, causing pacemaker to schedule recovery actions, which cause
>     load to go higher and more monitors to time out, etc. That's why
>     throttling is there.
> 
>     > I would like to get rid of "High CPU load detected" messages in the
>     > log, because
>     > they flood corosync.log as well as system journal.
>     >
>     > Maybe you can give an advice what would be the best way do to it?
>     >
>     > So far I came up with the idea of setting "load-threshold" to 1000% ,
>     > because of:
>     >     420(load average) / 24 (cores) = 17.5 (adjusted_load);
>     >     2 (THROTLE_FACTOR_HIGH) * 10 (throttle_load_target) = 20
>     >
>     >     if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>     >         crm_notice("High %s detected: %f", desc, load);
> 
>     That should work, as far as reducing the log messages, though of course
>     it also reduces the amount of throttling pacemaker will do.
> 
>     > In this case do I need to set "node-action-limit" to something
>     less than "2
>     > x cores" (which is default).
> 
>     It's not necessary, but it would help compensate for the reduced
>     throttling by imposing a maximum number of actions run at one time.
> 
>     I usually wouldn't recommend reducing log verbosity, because detailed
>     logs are often necessary for troubleshooting cluster issues, but if your
>     logs are on the same I/O controller that is overloaded, you might
>     consider logging only to syslog and not to an additional detail file.
>     That would cut back on the amount of I/O due to pacemaker itself. You
>     could even drop PCMK_logpriority to warning, but then you're losing even
>     more information.
> 
>     > Because the logic is (crmd/throttle.c):
>     >
>     >     switch(r->mode) {
>     >         case throttle_extreme:
>     >         case throttle_high:
>     >             jobs = 1; /* At least one job must always be allowed */
>     >             break;
>     >         case throttle_med:
>     >             jobs = QB_MAX(1, r->max / 4);
>     >             break;
>     >         case throttle_low:
>     >             jobs = QB_MAX(1, r->max / 2);
>     >             break;
>     >         case throttle_none:
>     >             jobs = QB_MAX(1, r->max);
>     >             break;
>     >         default:
>     >             crm_err("Unknown throttle mode %.4x on %s", r->mode,
>     node);
>     >             break;
>     >     }
>     >     return jobs;
>     >
>     >
>     > The thing is, I know that there is "High CPU load" and this is normal
>     > state, but I wont Pacemaker to not saying it to me and treat this
>     state the
>     > best it can.
> 
>     If you can't improve your I/O performance, what you suggested is
>     probably the best that can be done.
> 
>     When I/O is that critical to you, there are many tweaks that can make a
>     big difference in performance. I'm not sure how familiar you are with
>     them already. Options depend on what your storage is (local or network,
>     hardware/software/no RAID, etc.) and what your I/O-bound application is
>     (database, etc.), but I'd look closely at cache/buffer settings at all
>     levels from hardware to application, RAID stripe alignment, filesystem
>     choice and tuning, log verbosity, etc.
> 
> 
>     >
>     > Thank you,
>     > Kostia
>     >
>     > On Mon, Mar 14, 2016 at 7:18 PM, Ken Gaillot <kgaillot at redhat.com
>     <mailto:kgaillot at redhat.com>> wrote:
>     >
>     >> On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:
>     >>> I am back to this question =)
>     >>>
>     >>> I am still trying to understand the impact of "High CPU load
>     detected"
>     >>> messages in the log.
>     >>> Looking in the code I figured out that setting "load-threshold"
>     parameter
>     >>> to something higher than 100% solves the problem.
>     >>> And actually for 8 cores (12 with Hyper Threading)
>     load-threshold=400%
>     >> kind
>     >>> of works.
>     >>>
>     >>> Also I noticed that this parameter may have an impact on the
>     number of
>     >> "the
>     >>> maximum number of jobs that can be scheduled per node". As there
>     is a
>     >>> formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.
>     >>>
>     >>> Is my understanding correct that the impact of setting
>     "load-threshold"
>     >>> high enough (so there is no noisy messages) will lead only to the
>     >>> "throttle_job_max" and nothing more.
>     >>> Also, if I got it correct, than "throttle_job_max" is a number
>     of allowed
>     >>> parallel actions per node in lrmd.
>     >>> And a child of the lrmd is actually an RA process running some
>     actions
>     >>> (monitor, start, etc).
>     >>>
>     >>> So there is no impact on how many RA (resources) can run on a
>     node, but
>     >> how
>     >>> Pacemaker will operate with them in parallel (I am not sure I
>     understand
>     >>> this part correct).
>     >>
>     >> I believe that is an accurate description. I think the job limit
>     applies
>     >> to fence actions as well as lrmd actions.
>     >>
>     >> Note that if /proc/cpuinfo exists, pacemaker will figure out the
>     number
>     >> of cores from there, and divide the actual reported load by that
>     number
>     >> before comparing against load-threshold.
>     >>
>     >>> Thank you,
>     >>> Kostia
>     >>>
>     >>> On Wed, Jun 3, 2015 at 12:17 AM, Andrew Beekhof
>     <andrew at beekhof.net <mailto:andrew at beekhof.net>>
>     >> wrote:
>     >>>
>     >>>>
>     >>>>> On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko <
>     >>>> konstantin.ponomarenko at gmail.com
>     <mailto:konstantin.ponomarenko at gmail.com>> wrote:
>     >>>>>
>     >>>>> I think I wasn't precise in my questions.
>     >>>>> So I will try to ask more precise questions.
>     >>>>> 1. why the default value for "load-threshold" is 80%?
>     >>>>
>     >>>> Experimentation showed it better to begin throttling before the
>     node
>     >>>> became saturated.
>     >>>>
>     >>>>> 2. what would be the impact to the cluster in case of
>     >>>> "load-threshold=100%”?
>     >>>>
>     >>>> Your nodes will be busier.  Will they be able to handle your
>     load or
>     >> will
>     >>>> it result in additional recovery actions (creating more load
>     and more
>     >>>> failures)?  Only you will know when you try.
>     >>>>
>     >>>>>
>     >>>>> Thank you,
>     >>>>> Kostya
>     >>>>>
>     >>>>> On Mon, May 25, 2015 at 4:11 PM, Kostiantyn Ponomarenko <
>     >>>> konstantin.ponomarenko at gmail.com
>     <mailto:konstantin.ponomarenko at gmail.com>> wrote:
>     >>>>> Guys, please, if anyone can help me to understand this parameter
>     >> better,
>     >>>> I would be appreciated.
>     >>>>>
>     >>>>>
>     >>>>> Thank you,
>     >>>>> Kostya
>     >>>>>
>     >>>>> On Fri, May 22, 2015 at 4:15 PM, Kostiantyn Ponomarenko <
>     >>>> konstantin.ponomarenko at gmail.com
>     <mailto:konstantin.ponomarenko at gmail.com>> wrote:
>     >>>>> Another question - is it crmd specific to measure CPU usage by
>     "I/O
>     >>>> wait"?
>     >>>>> And if I need to get the most performance of the running
>     resources in
>     >>>> cluster, should I set "load-threshold=95%" (or even 100%)?
>     >>>>> Will it impact the cluster behavior in any ways?
>     >>>>> The man page for crmd says that it will "The cluster will slow
>     down its
>     >>>> recovery process when the amount of system resources used
>     (currently
>     >> CPU)
>     >>>> approaches this limit".
>     >>>>> Does it mean there will be delays in cluster in moving
>     resources in
>     >> case
>     >>>> a node goes down, or something else?
>     >>>>> I just want to understand in better.
>     >>>>>
>     >>>>> That you in advance for the help =)
>     >>>>>
>     >>>>> P.S.: The main resource does a lot of disk I/Os.
>     >>>>>
>     >>>>>
>     >>>>> Thank you,
>     >>>>> Kostya
>     >>>>>
>     >>>>> On Fri, May 22, 2015 at 3:30 PM, Kostiantyn Ponomarenko <
>     >>>> konstantin.ponomarenko at gmail.com
>     <mailto:konstantin.ponomarenko at gmail.com>> wrote:
>     >>>>> I didn't know that.
>     >>>>> You mentioned "as opposed to other Linuxes", but I am using Debian
>     >> Linux.
>     >>>>> Does it also measure CPU usage by I/O waits?
>     >>>>> You are right about "I/O waits" (a screenshot of "top" is
>     attached).
>     >>>>> But why it shows 50% of CPU usage for a single process (that
>     is the
>     >> main
>     >>>> one) while "I/O waits" shows a bigger number?
>     >>>>>
>     >>>>>
>     >>>>> Thank you,
>     >>>>> Kostya
>     >>>>>
>     >>>>> On Fri, May 22, 2015 at 9:40 AM, Ulrich Windl <
>     >>>> Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>> wrote:
>     >>>>>>>> "Ulrich Windl" <Ulrich.Windl at rz.uni-regensburg.de
>     <mailto:Ulrich.Windl at rz.uni-regensburg.de>> schrieb am
>     >>>> 22.05.2015 um
>     >>>>> 08:36 in Nachricht
>     <555EEA72020000A10001A71D at gwsmtp1.uni-regensburg.de
>     <mailto:555EEA72020000A10001A71D at gwsmtp1.uni-regensburg.de>
>     >>> :
>     >>>>>> Hi!
>     >>>>>>
>     >>>>>> I Linux I/O waits are considered for load (as opposed to other
>     >>>> Linuxes) Thus
>     >>>>> ^^ "In"
>     >>>>                             s/Linux/UNIX/
>     >>>>>
>     >>>>> (I should have my coffee now to awake ;-) Sorry.