[Pacemaker] (LRMD|PCMK)_MAX_CHILDREN?

Andrew Beekhof andrew at beekhof.net
Thu Sep 12 00:34:02 EDT 2013


On 11/09/2013, at 9:33 PM, Lars Marowsky-Bree <lmb at suse.com> wrote:

> On 2013-09-11T19:55:38, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>>> sorry for being thick, but I can't find this in the code now. Did this
>>> slip through again in April?
>> Apparently. But before we add it, I'd like to see if we can do something coherent.
>> Having 3 (or more) different variables (batch-limit, migration-limit and this) for controlling these things doesn't seem optimal or user friendly.
> 
> Well, they're all doing something completely different.

No, they're all crude approximations designed to stop the cluster as a whole from using up so much cpu/network/etc that recovery introduces more failures than it resolves.

> 
> A cluster-wide limit on operations (batch-limit) limits the total
> cluster and network/storage load.
> 
> The max_children prevent a given node from being overloaded by
> concurrent operations.

At the expense of introducing other failures... such as "I fired off an action N seconds ago with a timeout < N and still haven't heard back" which was possible if batch-limit and max children were too out of balance.
Which is why any limiting needs to happen at centrally on the DC.

> (Reducing batch-limit to emulate this kills
> cluster-wide parallelism and is not optimal.) Clearly, it's not perfect
> either (since it assumes all rsc ops on a node are identical in
> weight; whereas in reality we may want to limit VM start-up to 4, but
> would happily see 32 IP addresses go up at once, or 48 monitors ...),
> but it is an appropriate simplification.
> 
> migration-limit is indeed a special case (needed to limit nodes from
> being overloaded by migrate, which were at the time the only ops that
> affect two nodes at once - batch-limit="4" was too coarse a hammer). I
> do recall that we discussed making it more generic - so that one could
> configure cluster-/node-wide limits for certain operations of specific
> resource types, but that was (rightly) judged to be a rather complex can
> of worms by you.
> 
>> If anything, we should likely be putting work into auto-tuning this
>> stuff instead.  Somehow.
> 
> I'm not sure about how batch-limit can be auto-tuned.

If the cib's CPU usage starts going too high, its time to lower the limit.
Should be possible on linux.

> migration-threshold is mostly a function of the network bandwidth, too.
> 
> MAX_CHILDREN did, sort of, auto-tune (by defaulting to number of cores,
> or something similar, which was appropriate enough[1]).
> 
> It can all be made into a generic, powerful, flexible mechanism that
> describes them all. But I'm afraid that it'd also be quite complex. I'm
> happy to think about it, but the three limits we have/had seemed
> sufficient for the real-world.
> 
> 
> Regards,
>    Lars
> 
> [1] the main complaint was that it was configured via sysconfig, and not
> dynamic via a node attribute as it should be. When we reintroduce it, we
> may want to make nodes default to PCMK/LRMD_MAX_CHILDREN if unset in
> the CIB, and otherwise have that value override the environment
> variable?  That'd be a benefit now that pcmk and lrmd are more closely
> married.

As above, the rate limiting needs to happen on the DC which lends itself to being a property of the cib and/or transition graph rather than defined in sysconfig.

> 
> 
> -- 
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130912/6a637439/attachment-0003.sig>


More information about the Pacemaker mailing list