[ClusterLabs] Pacemaker 1.1.17 Release Candidate 4 (likely final)

Wed Jun 21 03:58:10 EDT 2017

Ken Gaillot <kgaillot at redhat.com> writes:

> The most significant change in this release is a new cluster option to
> improve scalability.
>
> As users start to create clusters with hundreds of resources and many
> nodes, one bottleneck is a complete reprobe of all resources (for
> example, after a cleanup of all resources).

Hi,

Does crm_resource --cleanup without any --resource specified do this?
Does this happen any other (automatic or manual) way?

> This can generate enough CIB updates to get the crmd's CIB connection
> dropped for not processing them quickly enough.

Is this a catastrophic scenario, or does the cluster recover gently?

> This bottleneck has been addressed with a new cluster option,
> cluster-ipc-limit, to raise the threshold for dropping the connection.
> The default is 500. The recommended value is the number of nodes in the
> cluster multiplied by the number of resources.

I'm running a production cluster with 6 nodes and 159 resources (ATM),
which gives almost twice the above default.  What symptoms should I
expect to see under 1.1.16?  (1.1.16 has just been released with Debian
stretch.  We can't really upgrade it, but changing the built-in default
is possible if it makes sense.)
-- 
Thanks,
Feri