[ClusterLabs] Coming in Pacemaker 3.0.2: further IPC improvements

Chris Lumens clumens at redhat.com
Wed Sep 10 15:13:18 UTC 2025


If you have a large cluster with many nodes or resources, you may have
seen pacemaker become unresponsive and get a log message about "evicting
client..." around the same time.  This can happen due to a sudden spike
of IPC messages between daemons, causing backlogs that cannot be handled
fast enough.  Pacemaker assumes that a daemon has died and restarts it.

Most of the time, this is not correct - the daemon has not died, it's
just not processing IPC messages as fast as the other end of the
connection is sending them, causing its backlog to grow.  One way to
avoid this problem is with the cluster-ipc-limit attribute, but the
problem with this is you need to know to set it beforehand and it's
always possible for the backlog to grow beyond whatever you set.

Starting with Pacemaker 3.0.2, the daemons will no longer be subject to
cluster-ipc-limit or to being evicted as long as we can detect they are
still processing messages.  Other IPC clients will still be subject to
these restrictions - we don't believe a client (which could be a command
line program like crm_mon or a third-party application) should be
allowed to crash a daemon.  Additionally, it's still possible for a
daemon to be evicted if it has truly crashed or is taking a very long
time to process a single message.

The majority of users should never run into the eviction problem in the
first place.  For those that do, these changes should result in improved
cluster stability.  If you've set cluster-ipc-limit at some point, you
may want to experiment with disabling it in 3.0.2, though leaving it set
won't cause any harm.

- Chris



More information about the Users mailing list