[Pacemaker] Managing big number of globally-unique clone instances

Fri Jul 18 03:16:52 EDT 2014

Hi Andrew, all,

I have a task which seems to be easily solvable with the use of
globally-unique clone: start huge number of specific virtual machines to
provide a load to a connection multiplexer.

I decided to look how pacemaker behaves in such setup with Dummy
resource agent, and found that handling of every instance in an
"initial" transition (probe+start) slows down with increase of clone-max.

F.e. for 256 instances transition took 225 seconds, ~0.88s per instance.
After I added 768 more instances (set clone-max to 1024) together with
increasing batch-limit to 512, transition took almost an hour (3507
seconds), or ~4.57s per added instance. Even if I take in account that
monitoring of already started instances consumes some resources, last
number seems to be rather big,

Main CPU consumer on DC while transition is running is crmd, Its memory
footprint is around 85Mb, resulting CIB size together with the status
section is around 2Mb,

Could it be possible to optimize this use-case from your opinion with
minimal efforts? Could it be optimized with just configuration? Or may
it be some trivial development task, f.e replace one GList with
GHashtable somewhere?

Sure I can look deeper and get any additional information, f.e. to get
crmd profiling results if it is hard to get an answer just from the head.

Best,
Vladislav