[ClusterLabs] Antw: [EXT] Re: Maximum cluster size with Pacemaker 2.x and Corosync 3.x, and scaling to hundreds of nodes

Strahil Nikolov hunter86_bg at yahoo.com
Fri Jul 31 05:19:21 EDT 2020

When I joined the previous company, we  were  just decommissioning  a 41-node  Scale-out HANA  /SLES 11/  with a 21-node /SLES12/ cluster.
The most popular was a 2-node  cluster,  but we had  a  lot  of issues .For  me  a  2-node clusters with qnet will be the most popular.

Strahil Nikolov

На 31 юли 2020 г. 8:57:29 GMT+03:00, Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de> написа:
>>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 30.07.2020 um 16:43 in
><93b973947008b62c4848f8a799ddc3f0949451e8.camel at redhat.com>:
>> On Wed, 2020‑07‑29 at 23:12 +0000, Toby Haynes wrote:
>>> In Corosync 1.x there was a limit on the maximum number of active
>>> nodes in a corosync cluster ‑ broswing the mailing list says 64
>>> hosts. The Pacemaker 1.1 documentation says scalability goes up to
>>> nodes. The Pacemaker 2.0 documentation says the same, although I
>>> can't find a maximum number of nodes in Corosync 3.
>> My understanding is that there is no theoretical limit, only
>> limits, so giving a single number is somewhat arbitrary.
>> There is a huge difference between full cluster nodes (running
>> and all pacemaker daemons) and Pacemaker Remote nodes (running only
>> pacemaker‑remoted).
>> Corosync uses a ring model where a token has to be passed in a very
>> short amount of time, and also has message guarantees (i.e. every
>> has to confirm receiving a message before it is made available), so
>> there is a low practical limit to full cluster nodes. The 16 or 32
>> number comes from what enterprise providers are willing to support,
>> is a good ballpark for a real‑world comfort zone. Even at 32 you need
>What I'd like to see is some table with recommended parameters,
>depending on
>the number of nodes and the maximum acceptable network delay.
>The other thing I'd like to see is a worl-wide histogram (x-axis:
>number of
>nodes, y-axis: number of installations) of pacemaker clusters.
>Here we have a configuration ot two 2-node clusters and one 3-node
>Initially we had planned to make one 7-node cluster, but basically
>(common fencing) and configuration issues (becoming complex) prevented
>> dedicated fast network and likely some tuning tweaks. Going beyond
>> is possible but depends on hardware and tuning, and becomes sensitive
>> to slight disturbances.
>> Pacemaker Remote nodes on the other hand are lightweight. They
>> communicate with only a single cluster node, with relatively low
>> traffic. The upper bound is unknown; some people report getting
>> errors with as few as 40 remote nodes, while others run over 100 with
>> no problems. So it may well depend on network and hardware
>See the parameter table requested above.
>> at high numbers, and you can run far more in VMs or containers than
>> bare metal, since traffic will (usually) be internal rather than over
>> the network.
>> I would expect a cluster with 16‑32 full nodes and several hundred
>> remotes (maybe even thousands in VMs or containers) to be feasible
>> the right hardware and tuning.
>I wonder: Do such configurations have a lot of identical or similar
>or do they do massive load balancing, or do they run many different
>> Since remotes don't run all the daemons, they can't do things like
>> directly execute fence devices or contribute to cluster quorum, but
>> remotes on bare metal or VMs are not really in a hierarchy as far as
>> the services being clustered go. A resource can move between cluster
>> and remote nodes, and a remote's connection can move from one cluster
>> node to another without interrupting the services on the remote.
