[ClusterLabs] Cluster metrics and collectd

Ken Gaillot kgaillot at redhat.com
Fri Oct 27 11:21:17 EDT 2017


On Thu, 2017-10-26 at 19:09 +1100, Huw Davies wrote:
> I was wondering if anyone had looked into getting cluster metrics
> (counters and performance) and exposing them via collect?
> 
> We’re about to go live with a large application on a four node RHEL
> and are interested in understanding the cluster loads - this will
> help us decide whether it’s better to run everything on a couple of
> nodes or to spread out across the whole cluster.
> 
> Locking in particular we are concerned could be a bottleneck (it is
> in the current implementation on another clustering solution)
> 
> Huw Davies           | e-mail: Huw.Davies at kerberos.davies.net.au
> Melbourne            | "If soccer was meant to be played in the
> Australia            | air, the sky would be painted green" 
> 

I'm not familiar with any collectd metrics for pacemaker, but I would
be curious what you end up going with.

Regarding load-balancing, the key thing is that the point of high-
availability is to withstand node loss. Your resources should be *able*
to run on a quorate subset of nodes, even if you spread them out during
normal operation, so it's a good idea to test that scenario under
production load.

With the default quorum options, a four-node cluster can lose only one
node and remain quorate, so you'd want to ensure you can run
comfortably on three nodes. If you use corosync's auto tie-breaker
feature, you could go down to two nodes in some conditions.

Whether it's "better" to concentrate or load-balance in normal
operation is a matter of trade-offs. The advantages of concentrating
are (1) lower power usage on the idle nodes, and (2) if production load
increases, you'll notice more quickly whether your subset of nodes can
handle it. The advantages of load-balancing are (1) continuously
exercising all nodes so you're not surprised in an outage if a node has
become degraded in some fashion, and (2) possibly better performance,
depending on workload and capacities.

Note that pacemaker has a "placement-strategy" option that lets you
fine-tune load-balancing vs concentration.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list