[Pacemaker] The larger cluster is tested.

Thu Nov 21 05:52:28 UTC 2013

Hi, Andrew

I understand.

More, a lower batch-limit, there is a possibility that the operation
of the cluster becomes too late.
I examine avoiding by changing adjustment of a parameter, or the motive method.

Thank you for various adjustments.
Yusuke
2013/11/19 Andrew Beekhof <andrew at beekhof.net>:
>
> On 16 Nov 2013, at 12:22 am, yusuke iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> Thanks for the suggestion variety.
>>
>> I fixed and tested the value of batch-limit by 1, 2, 3, and 4 from the
>> beginning, in order to confirm what batch-limit is suitable.
>>
>> It was something like the following in my environment.
>> Timeout did not occur batch-limit=1 and 2.
>> batch-limit = 3 was 1 timeout.
>> batch-limit = 4 was 5 timeout.
>>
>> I think the limit is still high in; From the above results, "limit =
>> QB_MAX (1, peers / 4)".
>
> Remember these results are specific to your (virtual) hardware and configured timeouts.
> I would argue that 5 timeouts out of 2853 actions is actually quite impressive for a default value in this sort of situation.[1]
>
> Some tuning in a cluster of this kind is to be expected.
>
> [1] It took crm_simulate 4 minutes to even pretend to perform all those operations.
>
>>
>> So I have created a fix to fixed to 2 batch-limit when it became a
>> state of extreme.
>> https://github.com/yuusuke/pacemaker/commit/efe2d6ebc55be39b8be43de38e7662f039b61dec
>>
>> Results of the test several times, it seems to work without problems.
>>
>> When batch-limit is fixed and tested, below has a report.
>> batch-limit=1
>> https://drive.google.com/file/d/0BwMFJItoO-fVNk8wTGlYNjNnSHc/edit?usp=sharing
>> batch-limit=2
>> https://drive.google.com/file/d/0BwMFJItoO-fVTnc4bXY2YXF2M2M/edit?usp=sharing
>> batch-limit=3
>> https://drive.google.com/file/d/0BwMFJItoO-fVYl9Gbks2VlJMR0k/edit?usp=sharing
>> batch-limit=4
>> https://drive.google.com/file/d/0BwMFJItoO-fVZnJIazd5MFQ1aGs/edit?usp=sharing
>>
>> The report at the time of making it operate by my test code is the following.
>> https://drive.google.com/file/d/0BwMFJItoO-fVbzB0NjFLeVY3Zmc/edit?usp=sharing
>>
>> Regards,
>> Yusuke
>>
>> 2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
>>> Did you look at the load numbers in the logs?
>>> The CPUs are being slammed for over 20 minutes.
>>>
>>> The automatic tuning can only help so much, you're simply asking the cluster to do more work than it is capable of.
>>> Giving more priority to cib operations the come via IPC is one option, but as I explained earlier, it comes at the cost of correctness.
>>>
>>> Given the huge mismatch between the nodes' capacity and the tasks you're asking them to achieve, your best path forward is probably setting a load-threshold < 40% or a batch-limit <= 8.
>>> Or we could try a patch like the one below if we think that the defaults are not aggressive enough.
>>>
>>> diff --git a/crmd/throttle.c b/crmd/throttle.c
>>> index d77195a..7636d4a 100644
>>> --- a/crmd/throttle.c
>>> +++ b/crmd/throttle.c
>>> @@ -611,14 +611,14 @@ throttle_get_total_job_limit(int l)
>>>         switch(r->mode) {
>>>
>>>             case throttle_extreme:
>>> -                if(limit == 0 || limit > peers/2) {
>>> -                    limit = peers/2;
>>> +                if(limit == 0 || limit > peers/4) {
>>> +                    limit = QB_MAX(1, peers/4);
>>>                 }
>>>                 break;
>>>
>>>             case throttle_high:
>>> -                if(limit == 0 || limit > peers) {
>>> -                    limit = peers;
>>> +                if(limit == 0 || limit > peers/2) {
>>> +                    limit = QB_MAX(1, peers/2);
>>>                 }
>>>                 break;
>>>             default:
>>>
>>> This may also be worthwhile:
>>>
>>> diff --git a/crmd/throttle.c b/crmd/throttle.c
>>> index d77195a..586513a 100644
>>> --- a/crmd/throttle.c
>>> +++ b/crmd/throttle.c
>>> @@ -387,22 +387,36 @@ static bool throttle_io_load(float *load, unsigned int *blocked)
>>> }
>>>
>>> static enum throttle_state_e
>>> -throttle_handle_load(float load, const char *desc)
>>> +throttle_handle_load(float load, const char *desc, int cores)
>>> {
>>> -    if(load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>>> +    float adjusted_load = load;
>>> +
>>> +    if(cores <= 0) {
>>> +        /* No adjusting of the supplied load value */
>>> +
>>> +    } else if(cores == 1) {
>>> +        /* On a single core machine, a load of 1.0 is already too high */
>>> +        adjusted_load = load * THROTTLE_FACTOR_MEDIUM;
>>> +
>>> +    } else {
>>> +        /* Normalize the load to be per-core */
>>> +        adjusted_load = load / cores;
>>> +    }
>>> +
>>> +    if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>>>         crm_notice("High %s detected: %f", desc, load);
>>>         return throttle_high;
>>>
>>> -    } else if(load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
>>> +    } else if(adjusted_load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
>>>         crm_info("Moderate %s detected: %f", desc, load);
>>>         return throttle_med;
>>>
>>> -    } else if(load > THROTTLE_FACTOR_LOW * throttle_load_target) {
>>> +    } else if(adjusted_load > THROTTLE_FACTOR_LOW * throttle_load_target) {
>>>         crm_debug("Noticable %s detected: %f", desc, load);
>>>         return throttle_low;
>>>     }
>>>
>>> -    crm_trace("Negligable %s detected: %f", desc, load);
>>> +    crm_trace("Negligable %s detected: %f", desc, adjusted_load);
>>>     return throttle_none;
>>> }
>>>
>>> @@ -464,22 +478,12 @@ throttle_mode(void)
>>>     }
>>>
>>>     if(throttle_load_avg(&load)) {
>>> -        float simple = load / cores;
>>> -        mode |= throttle_handle_load(simple, "CPU load");
>>> +        mode |= throttle_handle_load(load, "CPU load", cores);
>>>     }
>>>
>>>     if(throttle_io_load(&load, &blocked)) {
>>> -        float blocked_ratio = 0.0;
>>> -
>>> -        mode |= throttle_handle_load(load, "IO load");
>>> -
>>> -        if(cores) {
>>> -            blocked_ratio = blocked / cores;
>>> -        } else {
>>> -            blocked_ratio = blocked;
>>> -        }
>>> -
>>> -        mode |= throttle_handle_load(blocked_ratio, "blocked IO ratio");
>>> +        mode |= throttle_handle_load(load, "IO load", 0);
>>> +        mode |= throttle_handle_load(blocked, "blocked IO ratio", cores);
>>>     }
>>>
>>>     if(mode & throttle_extreme) {
>>>
>>>
>>>
>>>
>>> On 12 Nov 2013, at 3:25 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Hi, Andrew
>>>>
>>>> I'm sorry.
>>>> This report was a thing when two cores were assigned to the virtual machine.
>>>> https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing
>>>>
>>>> I'm sorry to be misleading.
>>>>
>>>> This is the report acquired with one core.
>>>> https://drive.google.com/file/d/0BwMFJItoO-fVSlo0dE0xMzNORGc/edit?usp=sharing
>>>>
>>>> It does not define the LRMD_MAX_CHILDREN on any node.
>>>> load-threshold is still default.
>>>> cib_max_cpu is set to 0.4 by the following processing.
>>>>
>>>>       if(cores == 1) {
>>>>           cib_max_cpu = 0.4;
>>>>       }
>>>>
>>>> since -- if it exceeds 60%, it will be in the state of Extreme.
>>>> Nov 08 11:08:31 [2390] vm01       crmd: (  throttle.c:441   )  notice:
>>>> throttle_mode:        Extreme CIB load detected: 0.670000
>>>>
>>>> From the state of a bit, DC is detecting that vm01 is in the state of Extreme.
>>>> Nov 08 11:08:32 [2387] vm13       crmd: (  throttle.c:701   )   debug:
>>>> throttle_update:     Host vm01 supports a maximum of 2 jobs and
>>>> throttle mode 1000.  New job limit is 1
>>>>
>>>> From the following log, a dynamic change of batch-limit also seems to
>>>> process satisfactorily.
>>>> # grep "throttle_get_total_job_limit" pacemaker.log
>>>> (snip)
>>>> Nov 08 11:08:31 [2387] vm13       crmd: (  throttle.c:629   )   trace:
>>>> throttle_get_total_job_limit:    No change to batch-limit=0
>>>> Nov 08 11:08:32 [2387] vm13       crmd: (  throttle.c:632   )   trace:
>>>> throttle_get_total_job_limit:    Using batch-limit=8
>>>> (snip)
>>>> Nov 08 11:10:32 [2387] vm13       crmd: (  throttle.c:632   )   trace:
>>>> throttle_get_total_job_limit:    Using batch-limit=16
>>>>
>>>> The above shows that it is not solved even if it restricts the whole
>>>> number of jobs by batch-limit.
>>>> Are there any other methods of reducing a synchronous message?
>>>>
>>>> Internal IPC message is not so much.
>>>> Do not be able to handle even a little it on the way to handle the
>>>> synchronization message?
>>>>
>>>> Regards,
>>>> Yusuke
>>>>
>>>> 2013/11/12 Andrew Beekhof <andrew at beekhof.net>:
>>>>>
>>>>> On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>>
>>>>>> Execution of the graph was also checked.
>>>>>> Since the number of pending(s) is restricted to 16 from the middle, it
>>>>>> is judged that batch-limit is effective.
>>>>>> Observing here, even if a job is restricted by batch-limit, two or
>>>>>> more jobs are always fired(ed) in 1 second.
>>>>>> These performed jobs return a result and the synchronous message of
>>>>>> CIB generates them.
>>>>>> The node which continued receiving a synchronous message processes
>>>>>> there preferentially, and postpones an internal IPC message.
>>>>>> I think that it caused timeout.
>>>>>
>>>>> What load-threshold were you running this with?
>>>>>
>>>>> I see this in the logs:
>>>>> "Host vm10 supports a maximum of 4 jobs and throttle mode 0100.  New job limit is 1"
>>>>>
>>>>> Have you set LRMD_MAX_CHILDREN=4 on these nodes?
>>>>> I wouldn't recommend that for a single core VM.  I'd let the default of 2*cores be used.
>>>>>
>>>>>
>>>>> Also, I'm not seeing "Extreme CIB load detected".  Are these still single core machines?
>>>>> If so it would suggest that something about:
>>>>>
>>>>>       if(cores == 1) {
>>>>>           cib_max_cpu = 0.4;
>>>>>       }
>>>>>       if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) {
>>>>>           cib_max_cpu = throttle_load_target;
>>>>>       }
>>>>>
>>>>>       if(load > 1.5 * cib_max_cpu) {
>>>>>           /* Can only happen on machines with a low number of cores */
>>>>>           crm_notice("Extreme %s detected: %f", desc, load);
>>>>>           mode |= throttle_extreme;
>>>>>
>>>>> is wrong.
>>>>>
>>>>> What was load-threshold configured as?
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ----------------------------------------
>>>> METRO SYSTEMS CO., LTD
>>>>
>>>> Yusuke Iida
>>>> Mail: yusk.iida at gmail.com
>>>> ----------------------------------------
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
----------------------------------------
METRO SYSTEMS CO., LTD

Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------