[ClusterLabs] Sub-second failover detection in Corosync/Pacemaker clusters - 2026 update?

Fri Feb 20 15:41:29 UTC 2026

Hi everyone,

I'm revisiting a thread from 2015 (https://www.mail-archive.com/users@clusterlabs.org/msg00554.html) about achieving sub-second failover detection in HA clusters, and I'm curious about the current state of affairs nearly a decade later.

My Environment:

- Corosync 3.1.6
- Pacemaker 2.1.2
- Architecture: 2-node cluster + QDevice (also testing 3-node setups)
- Network: Dedicated physical NIC for cluster traffic (low-latency requirements)

Specific Questions:

1. With modern Corosync/Pacemaker versions, is sub-second fault detection and failover initiation realistically achievable in production environments?
2. Are there any published measurements or community experiences showing the fastest stable failover times you've achieved? What's considered a reliable minimum time span?
3. Have there been significant enhancements in the newer versions of Corosync and Pacemaker (post-2015) that specifically target detection speed and failover latency?
4. If sub-second detection is possible, what are the key configuration parameters and potential trade-offs (false positives, network sensitivity, resource overhead)?

Thanks in advance!

Holger Haidinger

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20260220/64e92fb4/attachment.htm>