[ClusterLabs] no-quorum-policy=stop never executed, pacemaker stuck in election/integration, corosync running in "new membership" cycles with itself

Lars Ellenberg lars.ellenberg at linbit.com
Tue Jun 1 06:52:29 EDT 2021

pcmk 2.0.5, corosync 3.1.0, knet, rhel8
I know fencing "solves" this just fine.

what I'd like to understand though is: what exactly is corosync or
pacemaker waiting for here,
why does it not manage to get to the stage where it would even attempt
to "stop" stuff?

two "rings" aka knet interfaces.
node isolation test with iptables,
INPUT/OUTPUT -j DROP on one interface, shortly after on the second as well.
 node loses quorum (obviously).

pacemaker is expected to no-quorum-policy=stop,
but is "stuck" in Election -> Integration,
while corosync "cycles" bewteen "new membership" (with only itself, obviously)
and "token has not been received in ...", "sync members ...", "new
membership has formed ..."

I would have expected corosync to come back with a "stable non-quorate
membership" of just itself
within a very short period of time, and pacemaker winning the
"election"/"integration" with just itself,
and then trying to call "stop" on everything it knows about.
I'm asking for hints what to look for in the logs, or how to drill
down further as to why that is not the case.


