[ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

Wed Dec 6 19:03:53 UTC 2017

On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote:
> I assumed that with corosync 2.x quorum is maintained by corosync and
> pacemaker simply gets yes/no. Apparently this is more complicated.

It shouldn't be, but everything in HA-land is complicated :)

> 
> Trivial test two node cluster (two_node is intentionally not set to
> simulate "normal" behavior).
> 
> ha1:~ # crm configure show
> node 1084752129: ha1
> node 1084752130: ha2
> primitive stonith-sbd stonith:external/sbd \
> 	params pcmk_delay_max=30s
> property cib-bootstrap-options: \
> 	have-watchdog=true \
> 	dc-version=1.1.17-3.3-36d2962a8 \
> 	cluster-infrastructure=corosync \
> 	cluster-name=hacluster \
> 	stonith-enabled=true \
> 	placement-strategy=balanced \
> 	stonith-timeout=172 \
> 	no-quorum-policy=suicide
> rsc_defaults rsc-options: \
> 	resource-stickiness=1 \
> 	migration-threshold=3
> op_defaults op-options: \
> 	timeout=600 \
> 	record-pending=true
> 
> I boot one node.
> 
> ha1:~ # corosync-quorumtool
> Quorum information
> ------------------
> Date:             Sun Dec  3 13:44:55 2017
> Quorum provider:  corosync_votequorum
> Nodes:            1
> Node ID:          1084752129
> Ring ID:          1084752129/240
> Quorate:          No
> 
> Votequorum information
> ----------------------
> Expected votes:   2
> Highest expected: 2
> Total votes:      1
> Quorum:           2 Activity blocked
> Flags:
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
> 1084752129          1 ha1 (local)
> 
> 
> ha1:~ # crm_mon -1rf
> Stack: corosync
> Current DC: ha1 (version 1.1.17-3.3-36d2962a8) - partition WITHOUT
> quorum
> Last updated: Sun Dec  3 13:48:46 2017
> Last change: Sun Dec  3 12:09:19 2017 by root via cibadmin on ha1
> 
> 2 nodes configured
> 1 resource configured
> 
> Node ha2: UNCLEAN (offline)
> Online: [ ha1 ]
> 
> Full list of resources:
> 
>  stonith-sbd	(stonith:external/sbd):	Stopped
> 
> Migration Summary:
> * Node ha1:
> 
> So far that's expected. We are out of quorum so nothing happens.
> Although the first surprise was this message (which confirmed past
> empirical observations):
> 
> Dec 03 13:44:57 [1632] ha1    pengine:   notice: stage6:	Canno
> t fence
> unclean nodes until quorum is attained (or no-quorum-policy is set to
> ignore)
> 
> I assume this is intentional behavior in which case it would be
> really
> good to have it mentioned in documentation as well. So far I have not
> seen comprehensive explanation of pacemaker startup logic (what and
> when
> it decides to do).

Yes, it's intentional. It's not specific to start-up, but to anytime
the cluster doesn't have quorum. It is (ambiguously) documented under
the have-quorum property:

"Indicates if the cluster has quorum. If false, this may mean that the
cluster cannot start resources or fence other nodes."

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemake
r_Explained/index.html#_cib_properties

Some exceptions that should be documented:

* Quorum only applies to fencing initiated by Pacemaker. If an
administrator or application (such as DLM) initiates fencing via
stonith_admin or the stonithd API, that bypasses Pacemaker fencing
policies.

* If no-quorum-policy=ignore, then loss of quorum will not prevent
fencing.

* A partition without quorum can fence any node that is a member of
that partition. (As a side effect, this allows no-quorum-policy=suicide 
to work.)

> 
> OK, let's pretend we have quorum.
> 
> ha1:~ # corosync-cmapctl -s quorum.expected_votes u32 1
> ha1:~ # corosync-quorumtool
> Quorum information
> ------------------
> Date:             Sun Dec  3 13:52:19 2017
> Quorum provider:  corosync_votequorum
> Nodes:            1
> Node ID:          1084752129
> Ring ID:          1084752129/240
> Quorate:          Yes
> 
> Votequorum information
> ----------------------
> Expected votes:   1
> Highest expected: 1
> Total votes:      1
> Quorum:           1
> Flags:            Quorate
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
> 1084752129          1 ha1 (local)
> 
> So corosync apparently believes we are in quorum now. What pacemaker
> does?
> 
> ha1:~ # crm_mon -1rf
> Stack: corosync
> Current DC: ha1 (version 1.1.17-3.3-36d2962a8) - partition with
> quorum
> Last updated: Sun Dec  3 13:53:22 2017
> Last change: Sun Dec  3 12:09:19 2017 by root via cibadmin on ha1
> 
> 2 nodes configured
> 1 resource configured
> 
> Node ha2: UNCLEAN (offline)
> Online: [ ha1 ]
> 
> Full list of resources:
> 
>  stonith-sbd	(stonith:external/sbd):	Stopped
> 
> Migration Summary:
> * Node ha1:
> 
> Nothing really changed. Although it quite clearly tells we are in
> quorum, it still won't start any resource nor attempt to fence
> another
> node. Although logs say
> 
> Dec 03 13:52:07 [1633] ha1       crmd:   notice:
> pcmk_quorum_notification:	Quorum acquired | membership=240
> members=1
> Dec 03 13:52:07 [1626] ha1 pacemakerd:   notice:
> pcmk_quorum_notification:	Quorum acquired | membership=240
> members=1
> 
> There is still *no* attempt to do anything.
> 
> This may be related to previous message
> 
> Dec 03 13:44:57 [1629] ha1 stonith-ng:   notice: unpack_config:
> Resetting no-quorum-policy to 'stop': cluster has never had quorum

The above message applies when no-quorum-policy=suicide. When the
cluster is first starting up, it doesn't have quorum ... if we honored
no-quorum-policy in that case, you could never start the cluster. So,
suicide doesn't become effective until the cluster gets quorum the
first time.

> Which opens up question - where can I see this temporary value for
> no-quorum-policy? It is not present in CIB, how can I query the
> "effective" value of property?

grep /var/log/messages :)

Such temporary overrides have no persistent location to be queried;
they are calculated on-the-fly when needed. Hence, the log messages.

> Still even though pacemaker does not attempt to actually start
> resources, it apparently believes it was in quorum, because as soon
> as I
> increase number of votes back to 2, node immediately resets (due to
> no-quorum-policy=suicide).
> 
> Confused ... is it intentional behavior or a bug?

The no-quorum-policy message above shouldn't prevent the cluster from
either fencing other nodes or starting resources, once quorum is
obtained from corosync. I'm not sure from the information here why that
didn't happen.

I'd first check the Pacemaker logs for "Quorum acquired" and "Quorum
lost" messages. These indicate when Pacemaker received notifications
from corosync. Assuming those were received properly, the DC should
then recalculate what needs to be done, and the logs at that point
should not have any of the messages about not having quorum.
-- 
Ken Gaillot <kgaillot at redhat.com>