<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


<meta name="Generator" content="Microsoft Word 15 (filtered medium)">


<style><!--


/* Font Definitions */


@font-face


        {font-family:"Cambria Math";


        panose-1:2 4 5 3 5 4 6 3 2 4;}


@font-face


        {font-family:Calibri;


        panose-1:2 15 5 2 2 2 4 3 2 4;}


/* Style Definitions */


p.MsoNormal, li.MsoNormal, div.MsoNormal


        {margin:0in;


        margin-bottom:.0001pt;


        font-size:12.0pt;


        font-family:"Times New Roman",serif;}


a:link, span.MsoHyperlink


        {mso-style-priority:99;


        color:blue;


        text-decoration:underline;}


a:visited, span.MsoHyperlinkFollowed


        {mso-style-priority:99;


        color:purple;


        text-decoration:underline;}


p


        {mso-style-priority:99;


        mso-margin-top-alt:auto;


        margin-right:0in;


        mso-margin-bottom-alt:auto;


        margin-left:0in;


        font-size:12.0pt;


        font-family:"Times New Roman",serif;}


span.EmailStyle17


        {mso-style-type:personal-reply;


        font-family:"Calibri",sans-serif;


        color:#1F497D;}


.MsoChpDefault


        {mso-style-type:export-only;}


@page WordSection1


        {size:8.5in 11.0in;


        margin:1.0in 1.0in 1.0in 1.0in;}


div.WordSection1


        {page:WordSection1;}


--></style><!--[if gte mso 9]><xml>


<o:shapedefaults v:ext="edit" spidmax="1026" />


</xml><![endif]--><!--[if gte mso 9]><xml>


<o:shapelayout v:ext="edit">


<o:idmap v:ext="edit" data="1" />


</o:shapelayout></xml><![endif]-->


</head>


<body lang="EN-US" link="blue" vlink="purple">


<div class="WordSection1">


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Hello All,<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Sorry for resurrecting old thread.<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I am also observing “High CPU load detected" messages in the logs<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">In this email chain, I see everyone is suggesting to change "load-threshold" settings<o:p></o:p></span></p>


<p class="MsoNormal"><o:p> </o:p></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">But I am not able to find any good information about “load-threshold” except this


</span><a href="https://www.mankier.com/7/crmd">https://www.mankier.com/7/crmd</a><o:p></o:p></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Even in Pacemaker document</span> “<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><a href="http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/pdf/Pacemaker_Explained/Pacemaker-1.1-Pacemaker_Explained-en-US.pdf">http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/pdf/Pacemaker_Explained/Pacemaker-1.1-Pacemaker_Explained-en-US.pdf</a>”<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">There is not much detail about “load-threshold”.<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Please can someone share steps or any commands to modify “load-threshold”.<o:p></o:p></span></p>


<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Thanks<o:p></o:p></span></p>


<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Jitendra<o:p></o:p></span></p>


<p class="MsoNormal"><a name="_MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></a></p>


<p class="MsoNormal"><a name="_____replyseparator"></a><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Kostiantyn Ponomarenko [mailto:konstantin.ponomarenko@gmail.com]


<br>


<b>Sent:</b> Tuesday, April 5, 2016 8:37 AM<br>


<b>To:</b> kgaillot@redhat.com<br>


<b>Cc:</b> Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org><br>


<b>Subject:</b> Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected<o:p></o:p></span></p>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<p class="MsoNormal">Thank you, Ken.<o:p></o:p></p>


<div>


<p class="MsoNormal">This helps a lot.<o:p></o:p></p>


</div>


<div>


<p class="MsoNormal">Now I am sure that my current approach fits best for me =)<o:p></o:p></p>


</div>


</div>


<div>


<p class="MsoNormal"><br clear="all">


<o:p></o:p></p>


<div>


<div>


<div>


<div>


<div>


<p class="MsoNormal">Thank you,<o:p></o:p></p>


<div>


<p class="MsoNormal">Kostia<o:p></o:p></p>


</div>


</div>


</div>


</div>


</div>


</div>


<p class="MsoNormal"><o:p> </o:p></p>


<div>


<p class="MsoNormal">On Wed, Mar 30, 2016 at 11:10 PM, Ken Gaillot <<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>> wrote:<o:p></o:p></p>


<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">


<p class="MsoNormal">On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:<br>


> Ken, thank you for the answer.<br>


><br>


> Every node in my cluster under normal conditions has "load average" of<br>


> about 420. It is mainly connected to the high disk IO on the system.<br>


> My system is designed to use almost 100% of its hardware (CPU/RAM/disks),<br>


> so the situation when the system consumes almost all HW resources is<br>


> normal.<br>


<br>


420 suggests that HW resources are outstripped -- anything above the<br>


system's number of cores means processes are waiting for some resource.<br>


(Although with an I/O-bound workload like this, the number of cores<br>


isn't very important -- most will be sitting idle despite the high<br>


load.) And if that's during normal conditions, what will happen during a<br>


usage spike? It sounds like a recipe for less-than-HA.<br>


<br>


Under high load, there's a risk of negative feedback, where monitors<br>


time out, causing pacemaker to schedule recovery actions, which cause<br>


load to go higher and more monitors to time out, etc. That's why<br>


throttling is there.<br>


<br>


> I would like to get rid of "High CPU load detected" messages in the<br>


> log, because<br>


> they flood corosync.log as well as system journal.<br>


><br>


> Maybe you can give an advice what would be the best way do to it?<br>


><br>


> So far I came up with the idea of setting "load-threshold" to 1000% ,<br>


> because of:<br>


>     420(load average) / 24 (cores) = 17.5 (adjusted_load);<br>


>     2 (THROTLE_FACTOR_HIGH) * 10 (throttle_load_target) = 20<br>


><br>


>     if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {<br>


>         crm_notice("High %s detected: %f", desc, load);<br>


<br>


That should work, as far as reducing the log messages, though of course<br>


it also reduces the amount of throttling pacemaker will do.<br>


<br>


> In this case do I need to set "node-action-limit" to something less than "2<br>


> x cores" (which is default).<br>


<br>


It's not necessary, but it would help compensate for the reduced<br>


throttling by imposing a maximum number of actions run at one time.<br>


<br>


I usually wouldn't recommend reducing log verbosity, because detailed<br>


logs are often necessary for troubleshooting cluster issues, but if your<br>


logs are on the same I/O controller that is overloaded, you might<br>


consider logging only to syslog and not to an additional detail file.<br>


That would cut back on the amount of I/O due to pacemaker itself. You<br>


could even drop PCMK_logpriority to warning, but then you're losing even<br>


more information.<br>


<br>


> Because the logic is (crmd/throttle.c):<br>


><br>


>     switch(r->mode) {<br>


>         case throttle_extreme:<br>


>         case throttle_high:<br>


>             jobs = 1; /* At least one job must always be allowed */<br>


>             break;<br>


>         case throttle_med:<br>


>             jobs = QB_MAX(1, r->max / 4);<br>


>             break;<br>


>         case throttle_low:<br>


>             jobs = QB_MAX(1, r->max / 2);<br>


>             break;<br>


>         case throttle_none:<br>


>             jobs = QB_MAX(1, r->max);<br>


>             break;<br>


>         default:<br>


>             crm_err("Unknown throttle mode %.4x on %s", r->mode, node);<br>


>             break;<br>


>     }<br>


>     return jobs;<br>


><br>


><br>


> The thing is, I know that there is "High CPU load" and this is normal<br>


> state, but I wont Pacemaker to not saying it to me and treat this state the<br>


> best it can.<br>


<br>


If you can't improve your I/O performance, what you suggested is<br>


probably the best that can be done.<br>


<br>


When I/O is that critical to you, there are many tweaks that can make a<br>


big difference in performance. I'm not sure how familiar you are with<br>


them already. Options depend on what your storage is (local or network,<br>


hardware/software/no RAID, etc.) and what your I/O-bound application is<br>


(database, etc.), but I'd look closely at cache/buffer settings at all<br>


levels from hardware to application, RAID stripe alignment, filesystem<br>


choice and tuning, log verbosity, etc.<o:p></o:p></p>


<div>


<div>


<p class="MsoNormal" style="margin-bottom:12.0pt"><br>


><br>


> Thank you,<br>


> Kostia<br>


><br>


> On Mon, Mar 14, 2016 at 7:18 PM, Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>> wrote:<br>


><br>


>> On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:<br>


>>> I am back to this question =)<br>


>>><br>


>>> I am still trying to understand the impact of "High CPU load detected"<br>


>>> messages in the log.<br>


>>> Looking in the code I figured out that setting "load-threshold" parameter<br>


>>> to something higher than 100% solves the problem.<br>


>>> And actually for 8 cores (12 with Hyper Threading) load-threshold=400%<br>


>> kind<br>


>>> of works.<br>


>>><br>


>>> Also I noticed that this parameter may have an impact on the number of<br>


>> "the<br>


>>> maximum number of jobs that can be scheduled per node". As there is a<br>


>>> formula to limit F_CRM_THROTTLE_MAX based on F_CRM_THROTTLE_MODE.<br>


>>><br>


>>> Is my understanding correct that the impact of setting "load-threshold"<br>


>>> high enough (so there is no noisy messages) will lead only to the<br>


>>> "throttle_job_max" and nothing more.<br>


>>> Also, if I got it correct, than "throttle_job_max" is a number of allowed<br>


>>> parallel actions per node in lrmd.<br>


>>> And a child of the lrmd is actually an RA process running some actions<br>


>>> (monitor, start, etc).<br>


>>><br>


>>> So there is no impact on how many RA (resources) can run on a node, but<br>


>> how<br>


>>> Pacemaker will operate with them in parallel (I am not sure I understand<br>


>>> this part correct).<br>


>><br>


>> I believe that is an accurate description. I think the job limit applies<br>


>> to fence actions as well as lrmd actions.<br>


>><br>


>> Note that if /proc/cpuinfo exists, pacemaker will figure out the number<br>


>> of cores from there, and divide the actual reported load by that number<br>


>> before comparing against load-threshold.<br>


>><br>


>>> Thank you,<br>


>>> Kostia<br>


>>><br>


>>> On Wed, Jun 3, 2015 at 12:17 AM, Andrew Beekhof <<a href="mailto:andrew@beekhof.net">andrew@beekhof.net</a>><br>


>> wrote:<br>


>>><br>


>>>><br>


>>>>> On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko <<br>


>>>> <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>> wrote:<br>


>>>>><br>


>>>>> I think I wasn't precise in my questions.<br>


>>>>> So I will try to ask more precise questions.<br>


>>>>> 1. why the default value for "load-threshold" is 80%?<br>


>>>><br>


>>>> Experimentation showed it better to begin throttling before the node<br>


>>>> became saturated.<br>


>>>><br>


>>>>> 2. what would be the impact to the cluster in case of<br>


>>>> "load-threshold=100%”?<br>


>>>><br>


>>>> Your nodes will be busier.  Will they be able to handle your load or<br>


>> will<br>


>>>> it result in additional recovery actions (creating more load and more<br>


>>>> failures)?  Only you will know when you try.<br>


>>>><br>


>>>>><br>


>>>>> Thank you,<br>


>>>>> Kostya<br>


>>>>><br>


>>>>> On Mon, May 25, 2015 at 4:11 PM, Kostiantyn Ponomarenko <<br>


>>>> <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>> wrote:<br>


>>>>> Guys, please, if anyone can help me to understand this parameter<br>


>> better,<br>


>>>> I would be appreciated.<br>


>>>>><br>


>>>>><br>


>>>>> Thank you,<br>


>>>>> Kostya<br>


>>>>><br>


>>>>> On Fri, May 22, 2015 at 4:15 PM, Kostiantyn Ponomarenko <<br>


>>>> <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>> wrote:<br>


>>>>> Another question - is it crmd specific to measure CPU usage by "I/O<br>


>>>> wait"?<br>


>>>>> And if I need to get the most performance of the running resources in<br>


>>>> cluster, should I set "load-threshold=95%" (or even 100%)?<br>


>>>>> Will it impact the cluster behavior in any ways?<br>


>>>>> The man page for crmd says that it will "The cluster will slow down its<br>


>>>> recovery process when the amount of system resources used (currently<br>


>> CPU)<br>


>>>> approaches this limit".<br>


>>>>> Does it mean there will be delays in cluster in moving resources in<br>


>> case<br>


>>>> a node goes down, or something else?<br>


>>>>> I just want to understand in better.<br>


>>>>><br>


>>>>> That you in advance for the help =)<br>


>>>>><br>


>>>>> P.S.: The main resource does a lot of disk I/Os.<br>


>>>>><br>


>>>>><br>


>>>>> Thank you,<br>


>>>>> Kostya<br>


>>>>><br>


>>>>> On Fri, May 22, 2015 at 3:30 PM, Kostiantyn Ponomarenko <<br>


>>>> <a href="mailto:konstantin.ponomarenko@gmail.com">konstantin.ponomarenko@gmail.com</a>> wrote:<br>


>>>>> I didn't know that.<br>


>>>>> You mentioned "as opposed to other Linuxes", but I am using Debian<br>


>> Linux.<br>


>>>>> Does it also measure CPU usage by I/O waits?<br>


>>>>> You are right about "I/O waits" (a screenshot of "top" is attached).<br>


>>>>> But why it shows 50% of CPU usage for a single process (that is the<br>


>> main<br>


>>>> one) while "I/O waits" shows a bigger number?<br>


>>>>><br>


>>>>><br>


>>>>> Thank you,<br>


>>>>> Kostya<br>


>>>>><br>


>>>>> On Fri, May 22, 2015 at 9:40 AM, Ulrich Windl <<br>


>>>> <a href="mailto:Ulrich.Windl@rz.uni-regensburg.de">Ulrich.Windl@rz.uni-regensburg.de</a>> wrote:<br>


>>>>>>>> "Ulrich Windl" <<a href="mailto:Ulrich.Windl@rz.uni-regensburg.de">Ulrich.Windl@rz.uni-regensburg.de</a>> schrieb am<br>


>>>> 22.05.2015 um<br>


>>>>> 08:36 in Nachricht <<a href="mailto:555EEA72020000A10001A71D@gwsmtp1.uni-regensburg.de">555EEA72020000A10001A71D@gwsmtp1.uni-regensburg.de</a><br>


>>> :<br>


>>>>>> Hi!<br>


>>>>>><br>


>>>>>> I Linux I/O waits are considered for load (as opposed to other<br>


>>>> Linuxes) Thus<br>


>>>>> ^^ "In"<br>


>>>>                             s/Linux/UNIX/<br>


>>>>><br>


>>>>> (I should have my coffee now to awake ;-) Sorry.<br>


>><br>


>> _______________________________________________<br>


>> Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>


>> <a href="http://clusterlabs.org/mailman/listinfo/users" target="_blank">http://clusterlabs.org/mailman/listinfo/users</a><br>


>><br>


>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>


>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">


http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>


>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>


>><br>


><o:p></o:p></p>


</div>


</div>


</blockquote>


</div>


<p class="MsoNormal"><o:p> </o:p></p>


</div>


</div>


</body>


</html>