[ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

Andrei Borzenkov arvidjaar at gmail.com
Fri Aug 6 07:33:53 EDT 2021

On Thu, Aug 5, 2021 at 9:25 PM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> Three nodes A, B, C. Communication between A and B is blocked
> (completely - no packet can come in both direction). A and B can
> communicate with C.
> I expected that result will be two partitions - (A, C) and (B, C). To my
> surprise, A went offline leaving (B, C) running. It was always the same
> node (with node id 1 if it matters, out of 1, 2, 3).
> How surviving partition is determined in this case?

For the sake of archives - this is how Totem protocol works. Which
node will be isolated is non-deterministic and depends on whether C
receives a message from A or B first. A will mark B as unreachable
(failed) and send a message to C; once C gets this message it marks B
as failed and ignores further messages from it (actually this will
cause B to mark C as failed in return). So the cluster will be split
in two partitions - (A, C) and B. B sends exactly the same message
that marks A as failed. Both messages are sent after consensus timeout
so at approximately the same moment.

> Can I be sure the same will also work in case of multiple nodes? I.e. if
> I have two sites with equal number of nodes and the third site as
> witness and connectivity between multi-node sites is lost but each site
> can communicate with witness. Will one site go offline? Which one?

This should work exactly the same and the isolated site is just as
non-deterministic. Moreover, it will also be non-deterministic if the
number of nodes on sites without connectivity is different (at last I
do not see anything in Totem that would depend on the number of nodes
unless Corosync adds some external knobs here). So in case of site A
and B with 3 nodes each and site C with 1 node and site A losing
connectivity to C we may equally end up with 6+1 split as well as 3+4

More information about the Users mailing list