[ClusterLabs] poor performance for large resource configuration

Thu Oct 24 12:01:57 UTC 2024

On 10/21/24 13:07, zufei chen wrote:
> Hi all,
>
> background：
>
>  1. lustre(2.15.5) + corosync(3.1.5) + pacemaker(2.1.0-8.el8) +
>     pcs(0.10.8)
>  2. there are 11 nodes in total, divided into 3 groups. If a node
>     fails within a group, the resources can only be taken over by
>     nodes within that group.
>  3. Each node has 2 MDTs and 16 OSTs.
>
> Issues:
>
>  1. The resource configuration time progressively increases. the
>     second mdt-0  cost only   8s，the last ost-175 cost  1min:37s
>  2. The total time taken for the configuration is approximately 2
>     hours and 31 minutes. Is there a way to improve it?
>
>
> attachment:
> create bash: pcs_create.sh
> create log: pcs_create.log
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
Hi,

you could try to create cluster CIB configuration with pcs commands on a
file using the '-f' option and then push it to the pacemaker all at once.

pcs cluster cib > original.xml
cp original.xml new.xml
pcs -f new.xml <command>
...
...
pcs cluster cib-push new.xml diff-against=original.xml

And then wait for the cluster to settle into stable state:

crm_resource --wait

Or there is pcs command since version v0.11.8:

pcs status wait [<timeout>]

I hope this will help you to improve the performance.

Regards,
Miroslav