[Pacemaker] Problems with resource scaling

Wed Feb 29 02:43:45 EST 2012

25.02.2012 06:35, Atif Faheem wrote:
> Hi. I have been experimenting with resource scalability in Pacemaker. I
> started with no resources, and attempted to configure & start a few
> hundred dummy resources (a dummy ocf script that does not load the CPU)
> on a cluster of 4 virtual machines using crm configure, and noted that
> after adding about 200 resources the cluster grinds to a halt. To start
> an additional 100 resources, it took about 10 minutes for the crm
> configure to complete, and an additional 10 minutes for the resources to
> come up. I noted that the CIB process spikes to >90% as soon as a new
> resource is configured on the system, and stays there for sometime.
>  During this time it is possible that "crm status" shows all nodes to be
> offline. After a period of such instability, CIB CPU backs off, and then
> the cluster is stable. Is this behavior expected with the
> afore-mentioned cluster size / resource count? Are there any parameters
> or knobs we may want to look at / twiddle? My understanding was that
> with recent performance changes resource scaling should be much higher. 
> 

Did you add/start resources one-by-one or in a batched mode?
With "batched" I mean
* dump configuration to file
* change/add resource definitions in a file
* upload new configuration to cluster with "crm configure load update"

I now have cluster with ~300 resource instances (most of them are
unallocated clone instances though, as a workaround to what has been
hopefully fixed a day ago, cl#5038). And I never observe cib to be
cpu-hog with batched approach (although that was a real problem with
one-by-one resource handling).

Best,
Vladislav