[ClusterLabs] Corosync [RFC] Enforcing encryption by default (disabling unencrypted UDP/UDPU in standard builds)

Mon May 4 14:52:55 UTC 2026

Hi everyone,

Over the past while, we've dealt with several CVEs and bug reports 
related to maliciously crafted packets causing crashes or undefined 
behavior on unencrypted transports. To drastically minimize this 
unauthenticated remote attack vector, I am proposing a fundamental shift 
to a "secure by default" posture for Corosync.

I have opened a PR to enforce encryption at compile time for default builds:
https://github.com/corosync/corosync/pull/821

What this PR does:
In a default build, the code to handle unencrypted traffic is completely 
omitted. Unencrypted configurations will be rejected, and the legacy 
totemudp and udpu transports will be entirely unavailable. Encryption is 
strictly mandatory.

The Escape Hatch:
To be clear, this does not break highly specific edge cases or legacy 
systems - it just shifts the burden of choice. Package maintainers or 
users compiling from source who absolutely need the old behavior can 
consciously opt-in using two new configure flags:

   --enable-unencrypted: Allows crypto_cipher and crypto_hash to be set 
to 'none'.

   --enable-udpu: Restores the legacy totemudp and udpu transports 
(strictly requires --enable-unencrypted).

To ensure administrators can easily audit their binaries, corosync -v 
will now explicitly display enforce_encryption and without_udpu for 
standard builds, or unencrypted and udpu if the legacy flags were used.

Anticipated concerns:
I want to proactively address a few valid arguments against this change, 
and explain why the CVE risk and "secure by default" philosophy still 
take precedence:

- Private/Isolated Networks:
Many clusters run on private VLANs or dedicated backend cluster 
networks. While valid, relying solely on the network layer for security 
is risky (Defense in Depth). A misconfigured VLAN or compromised 
adjacent machine shatters that isolation. Furthermore, every time a 
fuzzer finds a flaw in the legacy UDP transport, we have to drop 
everything for a CVE. Enforcing encryption neutralizes this class of 
unauthenticated remote attacks. We cannot leave standard deployments 
vulnerable just to save isolated power users from adding a compile flag.

- Setup Complexity and Maintenance:
Setting up an unencrypted cluster is undeniably easier because you don't 
have to generate and distribute an authkey. Unencrypted traffic is also 
easier to inspect with tcpdump when troubleshooting. However, we cannot 
sacrifice baseline production security for configuration and debugging 
convenience. Distributing an authkey is a standard, one-time operation 
easily handled by modern automation, and debugging practices must evolve 
to match secure standards.

- Lower Memory Footprint of UDP(U):
Legacy UDP and UDPU transports have a smaller memory footprint compared 
to KNET, which is sometimes preferred in embedded devices or heavily 
constrained edge environments. If you are running an embedded system 
where KNET's memory usage is a dealbreaker, the default build is simply 
not for you—the --enable-udpu flag exists specifically for this hardware 
profile.

- Performance and Latency Overhead:
Some might argue that forcing crypto adds latency. Modern CPUs handle 
AES (especially via AES-NI) with incredible efficiency, making the 
overhead negligible for 99% of workloads. For the 1% (like 
ultra-low-latency HPC) where every microsecond is critical, the 
compile-time escape hatch allows you to bypass it.

- Upgrade Path and Distro Breakage:
Existing unencrypted clusters will fail to start if they upgrade to a 
standard default build of this new version. While we take backwards 
compatibility seriously, security must eventually take priority over 
legacy convenience. Distro maintainers can choose to compile with the 
escape hatch flags if they absolutely must maintain seamless upgrades 
for a specific release cycle.

Feedback requested:
Before moving forward, I would like to get the community's eyes on this 
patchset. I am specifically looking for technical and architectural 
feedback.

Because the compile-time escape hatch exists, niche use cases are fully 
covered. I'm not looking for "we've always done it this way" feedback, 
but if there is a fundamental, fact-based technical reason why a 
standard, modern Corosync deployment cannot enforce encryption by 
default that I haven't considered above, I definitely want to hear it.

Please take a look at the PR and drop your reviews, ACKs, or technical 
concerns either here on the list or directly on GitHub.

Thanks,
   Honza