[ClusterLabs] Antw: [EXT] no-quorum-policy=stop never executed, pacemaker stuck in election/integration, corosync running in "new membership" cycles with itself
Ulrich.Windl at rz.uni-regensburg.de
Tue Jun 1 07:18:43 EDT 2021
I can't answer, but I doubt the usefulness of "no-quorum-policy=stop":
If nodes loose quorum, they try to stop all resources, but "remain" in the
cluster (will respond to network queries (if any arrive).
If one of those "stop"s fails, the other part of the cluster never knows.
So what can be done? Should the "other(left)" part of the cluster start
resources, assuming the "other(right)" part of the cluster had stopped
>>> Lars Ellenberg <lars.ellenberg at linbit.com> schrieb am 01.06.2021 um 12:52
<CANr6vz-rbS3BnuJsxhQzRnMpJe1u+NPhqp+ejNJWnHDScZwSRg at mail.gmail.com>:
> pcmk 2.0.5, corosync 3.1.0, knet, rhel8
> I know fencing "solves" this just fine.
> what I'd like to understand though is: what exactly is corosync or
> pacemaker waiting for here,
> why does it not manage to get to the stage where it would even attempt
> to "stop" stuff?
> two "rings" aka knet interfaces.
> node isolation test with iptables,
> INPUT/OUTPUT ‑j DROP on one interface, shortly after on the second as well.
> node loses quorum (obviously).
> pacemaker is expected to no‑quorum‑policy=stop,
> but is "stuck" in Election ‑> Integration,
> while corosync "cycles" bewteen "new membership" (with only itself,
> and "token has not been received in ...", "sync members ...", "new
> membership has formed ..."
> I would have expected corosync to come back with a "stable non‑quorate
> membership" of just itself
> within a very short period of time, and pacemaker winning the
> "election"/"integration" with just itself,
> and then trying to call "stop" on everything it knows about.
> I'm asking for hints what to look for in the logs, or how to drill
> down further as to why that is not the case.
> Manage your subscription:
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users