[ClusterLabs] Corosync [RFC] Enforcing encryption by default (disabling unencrypted UDP/UDPU in standard builds)
Jan Friesse
jfriesse at redhat.com
Mon May 4 14:52:55 UTC 2026
Hi everyone,
Over the past while, we've dealt with several CVEs and bug reports
related to maliciously crafted packets causing crashes or undefined
behavior on unencrypted transports. To drastically minimize this
unauthenticated remote attack vector, I am proposing a fundamental shift
to a "secure by default" posture for Corosync.
I have opened a PR to enforce encryption at compile time for default builds:
https://github.com/corosync/corosync/pull/821
What this PR does:
In a default build, the code to handle unencrypted traffic is completely
omitted. Unencrypted configurations will be rejected, and the legacy
totemudp and udpu transports will be entirely unavailable. Encryption is
strictly mandatory.
The Escape Hatch:
To be clear, this does not break highly specific edge cases or legacy
systems - it just shifts the burden of choice. Package maintainers or
users compiling from source who absolutely need the old behavior can
consciously opt-in using two new configure flags:
--enable-unencrypted: Allows crypto_cipher and crypto_hash to be set
to 'none'.
--enable-udpu: Restores the legacy totemudp and udpu transports
(strictly requires --enable-unencrypted).
To ensure administrators can easily audit their binaries, corosync -v
will now explicitly display enforce_encryption and without_udpu for
standard builds, or unencrypted and udpu if the legacy flags were used.
Anticipated concerns:
I want to proactively address a few valid arguments against this change,
and explain why the CVE risk and "secure by default" philosophy still
take precedence:
- Private/Isolated Networks:
Many clusters run on private VLANs or dedicated backend cluster
networks. While valid, relying solely on the network layer for security
is risky (Defense in Depth). A misconfigured VLAN or compromised
adjacent machine shatters that isolation. Furthermore, every time a
fuzzer finds a flaw in the legacy UDP transport, we have to drop
everything for a CVE. Enforcing encryption neutralizes this class of
unauthenticated remote attacks. We cannot leave standard deployments
vulnerable just to save isolated power users from adding a compile flag.
- Setup Complexity and Maintenance:
Setting up an unencrypted cluster is undeniably easier because you don't
have to generate and distribute an authkey. Unencrypted traffic is also
easier to inspect with tcpdump when troubleshooting. However, we cannot
sacrifice baseline production security for configuration and debugging
convenience. Distributing an authkey is a standard, one-time operation
easily handled by modern automation, and debugging practices must evolve
to match secure standards.
- Lower Memory Footprint of UDP(U):
Legacy UDP and UDPU transports have a smaller memory footprint compared
to KNET, which is sometimes preferred in embedded devices or heavily
constrained edge environments. If you are running an embedded system
where KNET's memory usage is a dealbreaker, the default build is simply
not for you—the --enable-udpu flag exists specifically for this hardware
profile.
- Performance and Latency Overhead:
Some might argue that forcing crypto adds latency. Modern CPUs handle
AES (especially via AES-NI) with incredible efficiency, making the
overhead negligible for 99% of workloads. For the 1% (like
ultra-low-latency HPC) where every microsecond is critical, the
compile-time escape hatch allows you to bypass it.
- Upgrade Path and Distro Breakage:
Existing unencrypted clusters will fail to start if they upgrade to a
standard default build of this new version. While we take backwards
compatibility seriously, security must eventually take priority over
legacy convenience. Distro maintainers can choose to compile with the
escape hatch flags if they absolutely must maintain seamless upgrades
for a specific release cycle.
Feedback requested:
Before moving forward, I would like to get the community's eyes on this
patchset. I am specifically looking for technical and architectural
feedback.
Because the compile-time escape hatch exists, niche use cases are fully
covered. I'm not looking for "we've always done it this way" feedback,
but if there is a fundamental, fact-based technical reason why a
standard, modern Corosync deployment cannot enforce encryption by
default that I haven't considered above, I definitely want to hear it.
Please take a look at the PR and drop your reviews, ACKs, or technical
concerns either here on the list or directly on GitHub.
Thanks,
Honza
More information about the Users
mailing list