[ClusterLabs] Sub-second failover detection in Corosync/Pacemaker clusters - 2026 update?
Holger Haidinger <DE ERL SWD EM>
Holger.Haidinger at fluenceenergy.com
Fri Feb 20 15:41:29 UTC 2026
Hi everyone,
I'm revisiting a thread from 2015 (https://www.mail-archive.com/users@clusterlabs.org/msg00554.html) about achieving sub-second failover detection in HA clusters, and I'm curious about the current state of affairs nearly a decade later.
My Environment:
- Corosync 3.1.6
- Pacemaker 2.1.2
- Architecture: 2-node cluster + QDevice (also testing 3-node setups)
- Network: Dedicated physical NIC for cluster traffic (low-latency requirements)
Specific Questions:
1. With modern Corosync/Pacemaker versions, is sub-second fault detection and failover initiation realistically achievable in production environments?
2. Are there any published measurements or community experiences showing the fastest stable failover times you've achieved? What's considered a reliable minimum time span?
3. Have there been significant enhancements in the newer versions of Corosync and Pacemaker (post-2015) that specifically target detection speed and failover latency?
4. If sub-second detection is possible, what are the key configuration parameters and potential trade-offs (false positives, network sensitivity, resource overhead)?
Thanks in advance!
Holger Haidinger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20260220/64e92fb4/attachment.htm>
More information about the Users
mailing list