[Pacemaker] The larger cluster is tested.
Andrew Beekhof
andrew at beekhof.net
Mon Nov 18 19:01:00 EST 2013
On 16 Nov 2013, at 12:22 am, yusuke iida <yusk.iida at gmail.com> wrote:
> Hi, Andrew
>
> Thanks for the suggestion variety.
>
> I fixed and tested the value of batch-limit by 1, 2, 3, and 4 from the
> beginning, in order to confirm what batch-limit is suitable.
>
> It was something like the following in my environment.
> Timeout did not occur batch-limit=1 and 2.
> batch-limit = 3 was 1 timeout.
> batch-limit = 4 was 5 timeout.
>
> I think the limit is still high in; From the above results, "limit =
> QB_MAX (1, peers / 4)".
Remember these results are specific to your (virtual) hardware and configured timeouts.
I would argue that 5 timeouts out of 2853 actions is actually quite impressive for a default value in this sort of situation.[1]
Some tuning in a cluster of this kind is to be expected.
[1] It took crm_simulate 4 minutes to even pretend to perform all those operations.
>
> So I have created a fix to fixed to 2 batch-limit when it became a
> state of extreme.
> https://github.com/yuusuke/pacemaker/commit/efe2d6ebc55be39b8be43de38e7662f039b61dec
>
> Results of the test several times, it seems to work without problems.
>
> When batch-limit is fixed and tested, below has a report.
> batch-limit=1
> https://drive.google.com/file/d/0BwMFJItoO-fVNk8wTGlYNjNnSHc/edit?usp=sharing
> batch-limit=2
> https://drive.google.com/file/d/0BwMFJItoO-fVTnc4bXY2YXF2M2M/edit?usp=sharing
> batch-limit=3
> https://drive.google.com/file/d/0BwMFJItoO-fVYl9Gbks2VlJMR0k/edit?usp=sharing
> batch-limit=4
> https://drive.google.com/file/d/0BwMFJItoO-fVZnJIazd5MFQ1aGs/edit?usp=sharing
>
> The report at the time of making it operate by my test code is the following.
> https://drive.google.com/file/d/0BwMFJItoO-fVbzB0NjFLeVY3Zmc/edit?usp=sharing
>
> Regards,
> Yusuke
>
> 2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
>> Did you look at the load numbers in the logs?
>> The CPUs are being slammed for over 20 minutes.
>>
>> The automatic tuning can only help so much, you're simply asking the cluster to do more work than it is capable of.
>> Giving more priority to cib operations the come via IPC is one option, but as I explained earlier, it comes at the cost of correctness.
>>
>> Given the huge mismatch between the nodes' capacity and the tasks you're asking them to achieve, your best path forward is probably setting a load-threshold < 40% or a batch-limit <= 8.
>> Or we could try a patch like the one below if we think that the defaults are not aggressive enough.
>>
>> diff --git a/crmd/throttle.c b/crmd/throttle.c
>> index d77195a..7636d4a 100644
>> --- a/crmd/throttle.c
>> +++ b/crmd/throttle.c
>> @@ -611,14 +611,14 @@ throttle_get_total_job_limit(int l)
>> switch(r->mode) {
>>
>> case throttle_extreme:
>> - if(limit == 0 || limit > peers/2) {
>> - limit = peers/2;
>> + if(limit == 0 || limit > peers/4) {
>> + limit = QB_MAX(1, peers/4);
>> }
>> break;
>>
>> case throttle_high:
>> - if(limit == 0 || limit > peers) {
>> - limit = peers;
>> + if(limit == 0 || limit > peers/2) {
>> + limit = QB_MAX(1, peers/2);
>> }
>> break;
>> default:
>>
>> This may also be worthwhile:
>>
>> diff --git a/crmd/throttle.c b/crmd/throttle.c
>> index d77195a..586513a 100644
>> --- a/crmd/throttle.c
>> +++ b/crmd/throttle.c
>> @@ -387,22 +387,36 @@ static bool throttle_io_load(float *load, unsigned int *blocked)
>> }
>>
>> static enum throttle_state_e
>> -throttle_handle_load(float load, const char *desc)
>> +throttle_handle_load(float load, const char *desc, int cores)
>> {
>> - if(load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>> + float adjusted_load = load;
>> +
>> + if(cores <= 0) {
>> + /* No adjusting of the supplied load value */
>> +
>> + } else if(cores == 1) {
>> + /* On a single core machine, a load of 1.0 is already too high */
>> + adjusted_load = load * THROTTLE_FACTOR_MEDIUM;
>> +
>> + } else {
>> + /* Normalize the load to be per-core */
>> + adjusted_load = load / cores;
>> + }
>> +
>> + if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
>> crm_notice("High %s detected: %f", desc, load);
>> return throttle_high;
>>
>> - } else if(load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
>> + } else if(adjusted_load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
>> crm_info("Moderate %s detected: %f", desc, load);
>> return throttle_med;
>>
>> - } else if(load > THROTTLE_FACTOR_LOW * throttle_load_target) {
>> + } else if(adjusted_load > THROTTLE_FACTOR_LOW * throttle_load_target) {
>> crm_debug("Noticable %s detected: %f", desc, load);
>> return throttle_low;
>> }
>>
>> - crm_trace("Negligable %s detected: %f", desc, load);
>> + crm_trace("Negligable %s detected: %f", desc, adjusted_load);
>> return throttle_none;
>> }
>>
>> @@ -464,22 +478,12 @@ throttle_mode(void)
>> }
>>
>> if(throttle_load_avg(&load)) {
>> - float simple = load / cores;
>> - mode |= throttle_handle_load(simple, "CPU load");
>> + mode |= throttle_handle_load(load, "CPU load", cores);
>> }
>>
>> if(throttle_io_load(&load, &blocked)) {
>> - float blocked_ratio = 0.0;
>> -
>> - mode |= throttle_handle_load(load, "IO load");
>> -
>> - if(cores) {
>> - blocked_ratio = blocked / cores;
>> - } else {
>> - blocked_ratio = blocked;
>> - }
>> -
>> - mode |= throttle_handle_load(blocked_ratio, "blocked IO ratio");
>> + mode |= throttle_handle_load(load, "IO load", 0);
>> + mode |= throttle_handle_load(blocked, "blocked IO ratio", cores);
>> }
>>
>> if(mode & throttle_extreme) {
>>
>>
>>
>>
>> On 12 Nov 2013, at 3:25 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>
>>> Hi, Andrew
>>>
>>> I'm sorry.
>>> This report was a thing when two cores were assigned to the virtual machine.
>>> https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing
>>>
>>> I'm sorry to be misleading.
>>>
>>> This is the report acquired with one core.
>>> https://drive.google.com/file/d/0BwMFJItoO-fVSlo0dE0xMzNORGc/edit?usp=sharing
>>>
>>> It does not define the LRMD_MAX_CHILDREN on any node.
>>> load-threshold is still default.
>>> cib_max_cpu is set to 0.4 by the following processing.
>>>
>>> if(cores == 1) {
>>> cib_max_cpu = 0.4;
>>> }
>>>
>>> since -- if it exceeds 60%, it will be in the state of Extreme.
>>> Nov 08 11:08:31 [2390] vm01 crmd: ( throttle.c:441 ) notice:
>>> throttle_mode: Extreme CIB load detected: 0.670000
>>>
>>> From the state of a bit, DC is detecting that vm01 is in the state of Extreme.
>>> Nov 08 11:08:32 [2387] vm13 crmd: ( throttle.c:701 ) debug:
>>> throttle_update: Host vm01 supports a maximum of 2 jobs and
>>> throttle mode 1000. New job limit is 1
>>>
>>> From the following log, a dynamic change of batch-limit also seems to
>>> process satisfactorily.
>>> # grep "throttle_get_total_job_limit" pacemaker.log
>>> (snip)
>>> Nov 08 11:08:31 [2387] vm13 crmd: ( throttle.c:629 ) trace:
>>> throttle_get_total_job_limit: No change to batch-limit=0
>>> Nov 08 11:08:32 [2387] vm13 crmd: ( throttle.c:632 ) trace:
>>> throttle_get_total_job_limit: Using batch-limit=8
>>> (snip)
>>> Nov 08 11:10:32 [2387] vm13 crmd: ( throttle.c:632 ) trace:
>>> throttle_get_total_job_limit: Using batch-limit=16
>>>
>>> The above shows that it is not solved even if it restricts the whole
>>> number of jobs by batch-limit.
>>> Are there any other methods of reducing a synchronous message?
>>>
>>> Internal IPC message is not so much.
>>> Do not be able to handle even a little it on the way to handle the
>>> synchronization message?
>>>
>>> Regards,
>>> Yusuke
>>>
>>> 2013/11/12 Andrew Beekhof <andrew at beekhof.net>:
>>>>
>>>> On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>>
>>>>> Execution of the graph was also checked.
>>>>> Since the number of pending(s) is restricted to 16 from the middle, it
>>>>> is judged that batch-limit is effective.
>>>>> Observing here, even if a job is restricted by batch-limit, two or
>>>>> more jobs are always fired(ed) in 1 second.
>>>>> These performed jobs return a result and the synchronous message of
>>>>> CIB generates them.
>>>>> The node which continued receiving a synchronous message processes
>>>>> there preferentially, and postpones an internal IPC message.
>>>>> I think that it caused timeout.
>>>>
>>>> What load-threshold were you running this with?
>>>>
>>>> I see this in the logs:
>>>> "Host vm10 supports a maximum of 4 jobs and throttle mode 0100. New job limit is 1"
>>>>
>>>> Have you set LRMD_MAX_CHILDREN=4 on these nodes?
>>>> I wouldn't recommend that for a single core VM. I'd let the default of 2*cores be used.
>>>>
>>>>
>>>> Also, I'm not seeing "Extreme CIB load detected". Are these still single core machines?
>>>> If so it would suggest that something about:
>>>>
>>>> if(cores == 1) {
>>>> cib_max_cpu = 0.4;
>>>> }
>>>> if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) {
>>>> cib_max_cpu = throttle_load_target;
>>>> }
>>>>
>>>> if(load > 1.5 * cib_max_cpu) {
>>>> /* Can only happen on machines with a low number of cores */
>>>> crm_notice("Extreme %s detected: %f", desc, load);
>>>> mode |= throttle_extreme;
>>>>
>>>> is wrong.
>>>>
>>>> What was load-threshold configured as?
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> --
>>> ----------------------------------------
>>> METRO SYSTEMS CO., LTD
>>>
>>> Yusuke Iida
>>> Mail: yusk.iida at gmail.com
>>> ----------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> ----------------------------------------
> METRO SYSTEMS CO., LTD
>
> Yusuke Iida
> Mail: yusk.iida at gmail.com
> ----------------------------------------
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131119/52227220/attachment-0003.sig>
More information about the Pacemaker
mailing list