[Pacemaker] The larger cluster is tested.
yusuke iida
yusk.iida at gmail.com
Fri Nov 15 08:22:25 EST 2013
Hi, Andrew
Thanks for the suggestion variety.
I fixed and tested the value of batch-limit by 1, 2, 3, and 4 from the
beginning, in order to confirm what batch-limit is suitable.
It was something like the following in my environment.
Timeout did not occur batch-limit=1 and 2.
batch-limit = 3 was 1 timeout.
batch-limit = 4 was 5 timeout.
I think the limit is still high in; From the above results, "limit =
QB_MAX (1, peers / 4)".
So I have created a fix to fixed to 2 batch-limit when it became a
state of extreme.
https://github.com/yuusuke/pacemaker/commit/efe2d6ebc55be39b8be43de38e7662f039b61dec
Results of the test several times, it seems to work without problems.
When batch-limit is fixed and tested, below has a report.
batch-limit=1
https://drive.google.com/file/d/0BwMFJItoO-fVNk8wTGlYNjNnSHc/edit?usp=sharing
batch-limit=2
https://drive.google.com/file/d/0BwMFJItoO-fVTnc4bXY2YXF2M2M/edit?usp=sharing
batch-limit=3
https://drive.google.com/file/d/0BwMFJItoO-fVYl9Gbks2VlJMR0k/edit?usp=sharing
batch-limit=4
https://drive.google.com/file/d/0BwMFJItoO-fVZnJIazd5MFQ1aGs/edit?usp=sharing
The report at the time of making it operate by my test code is the following.
https://drive.google.com/file/d/0BwMFJItoO-fVbzB0NjFLeVY3Zmc/edit?usp=sharing
Regards,
Yusuke
2013/11/13 Andrew Beekhof <andrew at beekhof.net>:
> Did you look at the load numbers in the logs?
> The CPUs are being slammed for over 20 minutes.
>
> The automatic tuning can only help so much, you're simply asking the cluster to do more work than it is capable of.
> Giving more priority to cib operations the come via IPC is one option, but as I explained earlier, it comes at the cost of correctness.
>
> Given the huge mismatch between the nodes' capacity and the tasks you're asking them to achieve, your best path forward is probably setting a load-threshold < 40% or a batch-limit <= 8.
> Or we could try a patch like the one below if we think that the defaults are not aggressive enough.
>
> diff --git a/crmd/throttle.c b/crmd/throttle.c
> index d77195a..7636d4a 100644
> --- a/crmd/throttle.c
> +++ b/crmd/throttle.c
> @@ -611,14 +611,14 @@ throttle_get_total_job_limit(int l)
> switch(r->mode) {
>
> case throttle_extreme:
> - if(limit == 0 || limit > peers/2) {
> - limit = peers/2;
> + if(limit == 0 || limit > peers/4) {
> + limit = QB_MAX(1, peers/4);
> }
> break;
>
> case throttle_high:
> - if(limit == 0 || limit > peers) {
> - limit = peers;
> + if(limit == 0 || limit > peers/2) {
> + limit = QB_MAX(1, peers/2);
> }
> break;
> default:
>
> This may also be worthwhile:
>
> diff --git a/crmd/throttle.c b/crmd/throttle.c
> index d77195a..586513a 100644
> --- a/crmd/throttle.c
> +++ b/crmd/throttle.c
> @@ -387,22 +387,36 @@ static bool throttle_io_load(float *load, unsigned int *blocked)
> }
>
> static enum throttle_state_e
> -throttle_handle_load(float load, const char *desc)
> +throttle_handle_load(float load, const char *desc, int cores)
> {
> - if(load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
> + float adjusted_load = load;
> +
> + if(cores <= 0) {
> + /* No adjusting of the supplied load value */
> +
> + } else if(cores == 1) {
> + /* On a single core machine, a load of 1.0 is already too high */
> + adjusted_load = load * THROTTLE_FACTOR_MEDIUM;
> +
> + } else {
> + /* Normalize the load to be per-core */
> + adjusted_load = load / cores;
> + }
> +
> + if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
> crm_notice("High %s detected: %f", desc, load);
> return throttle_high;
>
> - } else if(load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
> + } else if(adjusted_load > THROTTLE_FACTOR_MEDIUM * throttle_load_target) {
> crm_info("Moderate %s detected: %f", desc, load);
> return throttle_med;
>
> - } else if(load > THROTTLE_FACTOR_LOW * throttle_load_target) {
> + } else if(adjusted_load > THROTTLE_FACTOR_LOW * throttle_load_target) {
> crm_debug("Noticable %s detected: %f", desc, load);
> return throttle_low;
> }
>
> - crm_trace("Negligable %s detected: %f", desc, load);
> + crm_trace("Negligable %s detected: %f", desc, adjusted_load);
> return throttle_none;
> }
>
> @@ -464,22 +478,12 @@ throttle_mode(void)
> }
>
> if(throttle_load_avg(&load)) {
> - float simple = load / cores;
> - mode |= throttle_handle_load(simple, "CPU load");
> + mode |= throttle_handle_load(load, "CPU load", cores);
> }
>
> if(throttle_io_load(&load, &blocked)) {
> - float blocked_ratio = 0.0;
> -
> - mode |= throttle_handle_load(load, "IO load");
> -
> - if(cores) {
> - blocked_ratio = blocked / cores;
> - } else {
> - blocked_ratio = blocked;
> - }
> -
> - mode |= throttle_handle_load(blocked_ratio, "blocked IO ratio");
> + mode |= throttle_handle_load(load, "IO load", 0);
> + mode |= throttle_handle_load(blocked, "blocked IO ratio", cores);
> }
>
> if(mode & throttle_extreme) {
>
>
>
>
> On 12 Nov 2013, at 3:25 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>
>> Hi, Andrew
>>
>> I'm sorry.
>> This report was a thing when two cores were assigned to the virtual machine.
>> https://drive.google.com/file/d/0BwMFJItoO-fVdlIwTVdFOGRkQ0U/edit?usp=sharing
>>
>> I'm sorry to be misleading.
>>
>> This is the report acquired with one core.
>> https://drive.google.com/file/d/0BwMFJItoO-fVSlo0dE0xMzNORGc/edit?usp=sharing
>>
>> It does not define the LRMD_MAX_CHILDREN on any node.
>> load-threshold is still default.
>> cib_max_cpu is set to 0.4 by the following processing.
>>
>> if(cores == 1) {
>> cib_max_cpu = 0.4;
>> }
>>
>> since -- if it exceeds 60%, it will be in the state of Extreme.
>> Nov 08 11:08:31 [2390] vm01 crmd: ( throttle.c:441 ) notice:
>> throttle_mode: Extreme CIB load detected: 0.670000
>>
>> From the state of a bit, DC is detecting that vm01 is in the state of Extreme.
>> Nov 08 11:08:32 [2387] vm13 crmd: ( throttle.c:701 ) debug:
>> throttle_update: Host vm01 supports a maximum of 2 jobs and
>> throttle mode 1000. New job limit is 1
>>
>> From the following log, a dynamic change of batch-limit also seems to
>> process satisfactorily.
>> # grep "throttle_get_total_job_limit" pacemaker.log
>> (snip)
>> Nov 08 11:08:31 [2387] vm13 crmd: ( throttle.c:629 ) trace:
>> throttle_get_total_job_limit: No change to batch-limit=0
>> Nov 08 11:08:32 [2387] vm13 crmd: ( throttle.c:632 ) trace:
>> throttle_get_total_job_limit: Using batch-limit=8
>> (snip)
>> Nov 08 11:10:32 [2387] vm13 crmd: ( throttle.c:632 ) trace:
>> throttle_get_total_job_limit: Using batch-limit=16
>>
>> The above shows that it is not solved even if it restricts the whole
>> number of jobs by batch-limit.
>> Are there any other methods of reducing a synchronous message?
>>
>> Internal IPC message is not so much.
>> Do not be able to handle even a little it on the way to handle the
>> synchronization message?
>>
>> Regards,
>> Yusuke
>>
>> 2013/11/12 Andrew Beekhof <andrew at beekhof.net>:
>>>
>>> On 11 Nov 2013, at 11:48 pm, yusuke iida <yusk.iida at gmail.com> wrote:
>>>
>>>> Execution of the graph was also checked.
>>>> Since the number of pending(s) is restricted to 16 from the middle, it
>>>> is judged that batch-limit is effective.
>>>> Observing here, even if a job is restricted by batch-limit, two or
>>>> more jobs are always fired(ed) in 1 second.
>>>> These performed jobs return a result and the synchronous message of
>>>> CIB generates them.
>>>> The node which continued receiving a synchronous message processes
>>>> there preferentially, and postpones an internal IPC message.
>>>> I think that it caused timeout.
>>>
>>> What load-threshold were you running this with?
>>>
>>> I see this in the logs:
>>> "Host vm10 supports a maximum of 4 jobs and throttle mode 0100. New job limit is 1"
>>>
>>> Have you set LRMD_MAX_CHILDREN=4 on these nodes?
>>> I wouldn't recommend that for a single core VM. I'd let the default of 2*cores be used.
>>>
>>>
>>> Also, I'm not seeing "Extreme CIB load detected". Are these still single core machines?
>>> If so it would suggest that something about:
>>>
>>> if(cores == 1) {
>>> cib_max_cpu = 0.4;
>>> }
>>> if(throttle_load_target > 0.0 && throttle_load_target < cib_max_cpu) {
>>> cib_max_cpu = throttle_load_target;
>>> }
>>>
>>> if(load > 1.5 * cib_max_cpu) {
>>> /* Can only happen on machines with a low number of cores */
>>> crm_notice("Extreme %s detected: %f", desc, load);
>>> mode |= throttle_extreme;
>>>
>>> is wrong.
>>>
>>> What was load-threshold configured as?
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>>
>> --
>> ----------------------------------------
>> METRO SYSTEMS CO., LTD
>>
>> Yusuke Iida
>> Mail: yusk.iida at gmail.com
>> ----------------------------------------
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
--
----------------------------------------
METRO SYSTEMS CO., LTD
Yusuke Iida
Mail: yusk.iida at gmail.com
----------------------------------------
More information about the Pacemaker
mailing list