[ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

Wed Dec 6 21:28:00 UTC 2017

On 12/06/2017 08:03 PM, Ken Gaillot wrote:
> On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote:
>> I assumed that with corosync 2.x quorum is maintained by corosync and
>> pacemaker simply gets yes/no. Apparently this is more complicated.
> It shouldn't be, but everything in HA-land is complicated :)
>
>> Trivial test two node cluster (two_node is intentionally not set to
>> simulate "normal" behavior).
>>
>> ha1:~ # crm configure show
>> node 1084752129: ha1
>> node 1084752130: ha2
>> primitive stonith-sbd stonith:external/sbd \
>> 	params pcmk_delay_max=30s
>> property cib-bootstrap-options: \
>> 	have-watchdog=true \
>> 	dc-version=1.1.17-3.3-36d2962a8 \
>> 	cluster-infrastructure=corosync \
>> 	cluster-name=hacluster \
>> 	stonith-enabled=true \
>> 	placement-strategy=balanced \
>> 	stonith-timeout=172 \
>> 	no-quorum-policy=suicide
>> rsc_defaults rsc-options: \
>> 	resource-stickiness=1 \
>> 	migration-threshold=3
>> op_defaults op-options: \
>> 	timeout=600 \
>> 	record-pending=true
>>
>> I boot one node.
>>
>> ha1:~ # corosync-quorumtool
>> Quorum information
>> ------------------
>> Date:             Sun Dec  3 13:44:55 2017
>> Quorum provider:  corosync_votequorum
>> Nodes:            1
>> Node ID:          1084752129
>> Ring ID:          1084752129/240
>> Quorate:          No
>>
>> Votequorum information
>> ----------------------
>> Expected votes:   2
>> Highest expected: 2
>> Total votes:      1
>> Quorum:           2 Activity blocked
>> Flags:
>>
>> Membership information
>> ----------------------
>>     Nodeid      Votes Name
>> 1084752129          1 ha1 (local)
>>
>>
>> ha1:~ # crm_mon -1rf
>> Stack: corosync
>> Current DC: ha1 (version 1.1.17-3.3-36d2962a8) - partition WITHOUT
>> quorum
>> Last updated: Sun Dec  3 13:48:46 2017
>> Last change: Sun Dec  3 12:09:19 2017 by root via cibadmin on ha1
>>
>> 2 nodes configured
>> 1 resource configured
>>
>> Node ha2: UNCLEAN (offline)
>> Online: [ ha1 ]
>>
>> Full list of resources:
>>
>>  stonith-sbd	(stonith:external/sbd):	Stopped
>>
>> Migration Summary:
>> * Node ha1:
>>
>> So far that's expected. We are out of quorum so nothing happens.
>> Although the first surprise was this message (which confirmed past
>> empirical observations):
>>
>> Dec 03 13:44:57 [1632] ha1    pengine:   notice: stage6:	Canno
>> t fence
>> unclean nodes until quorum is attained (or no-quorum-policy is set to
>> ignore)
>>
>> I assume this is intentional behavior in which case it would be
>> really
>> good to have it mentioned in documentation as well. So far I have not
>> seen comprehensive explanation of pacemaker startup logic (what and
>> when
>> it decides to do).
> Yes, it's intentional. It's not specific to start-up, but to anytime
> the cluster doesn't have quorum. It is (ambiguously) documented under
> the have-quorum property:
>
> "Indicates if the cluster has quorum. If false, this may mean that the
> cluster cannot start resources or fence other nodes."
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemake
> r_Explained/index.html#_cib_properties
>
> Some exceptions that should be documented:
>
> * Quorum only applies to fencing initiated by Pacemaker. If an
> administrator or application (such as DLM) initiates fencing via
> stonith_admin or the stonithd API, that bypasses Pacemaker fencing
> policies.
>
> * If no-quorum-policy=ignore, then loss of quorum will not prevent
> fencing.
>
> * A partition without quorum can fence any node that is a member of
> that partition. (As a side effect, this allows no-quorum-policy=suicide 
> to work.)
>
>
>> OK, let's pretend we have quorum.
>>
>> ha1:~ # corosync-cmapctl -s quorum.expected_votes u32 1
>> ha1:~ # corosync-quorumtool
>> Quorum information
>> ------------------
>> Date:             Sun Dec  3 13:52:19 2017
>> Quorum provider:  corosync_votequorum
>> Nodes:            1
>> Node ID:          1084752129
>> Ring ID:          1084752129/240
>> Quorate:          Yes
>>
>> Votequorum information
>> ----------------------
>> Expected votes:   1
>> Highest expected: 1
>> Total votes:      1
>> Quorum:           1
>> Flags:            Quorate
>>
>> Membership information
>> ----------------------
>>     Nodeid      Votes Name
>> 1084752129          1 ha1 (local)
>>
>> So corosync apparently believes we are in quorum now. What pacemaker
>> does?
>>
>> ha1:~ # crm_mon -1rf
>> Stack: corosync
>> Current DC: ha1 (version 1.1.17-3.3-36d2962a8) - partition with
>> quorum
>> Last updated: Sun Dec  3 13:53:22 2017
>> Last change: Sun Dec  3 12:09:19 2017 by root via cibadmin on ha1
>>
>> 2 nodes configured
>> 1 resource configured
>>
>> Node ha2: UNCLEAN (offline)
>> Online: [ ha1 ]
>>
>> Full list of resources:
>>
>>  stonith-sbd	(stonith:external/sbd):	Stopped
>>
>> Migration Summary:
>> * Node ha1:
>>
>> Nothing really changed. Although it quite clearly tells we are in
>> quorum, it still won't start any resource nor attempt to fence
>> another
>> node. Although logs say
>>
>> Dec 03 13:52:07 [1633] ha1       crmd:   notice:
>> pcmk_quorum_notification:	Quorum acquired | membership=240
>> members=1
>> Dec 03 13:52:07 [1626] ha1 pacemakerd:   notice:
>> pcmk_quorum_notification:	Quorum acquired | membership=240
>> members=1
>>
>> There is still *no* attempt to do anything.
>>
>> This may be related to previous message
>>
>> Dec 03 13:44:57 [1629] ha1 stonith-ng:   notice: unpack_config:
>> Resetting no-quorum-policy to 'stop': cluster has never had quorum
> The above message applies when no-quorum-policy=suicide. When the
> cluster is first starting up, it doesn't have quorum ... if we honored
> no-quorum-policy in that case, you could never start the cluster. So,
> suicide doesn't become effective until the cluster gets quorum the
> first time.
>
>> Which opens up question - where can I see this temporary value for
>> no-quorum-policy? It is not present in CIB, how can I query the
>> "effective" value of property?
> grep /var/log/messages :)
>
> Such temporary overrides have no persistent location to be queried;
> they are calculated on-the-fly when needed. Hence, the log messages.
>
>> Still even though pacemaker does not attempt to actually start
>> resources, it apparently believes it was in quorum, because as soon
>> as I
>> increase number of votes back to 2, node immediately resets (due to
>> no-quorum-policy=suicide).
>>
>> Confused ... is it intentional behavior or a bug?
> The no-quorum-policy message above shouldn't prevent the cluster from
> either fencing other nodes or starting resources, once quorum is
> obtained from corosync. I'm not sure from the information here why that
> didn't happen.

This is obviously a cluster with 2 nodes. Is it configured as 2-node
in corosync as well? If yes the wait-for-all-logic might be confused
somehow.
Which version of the sbd-daemon are you running?
As of 1.3.1 (actually a few commits before iirc) in case of 2-node-
configuration in corosync sbd wouldn't rely on quorum coming from
corosync but rather count the nodes seen by itself. Just in case you
see nodes suicide on loss of the disk while still being signaled
quorate from corosync either due to the 2-node-config or some fake
config as you did...

Regards,
Klaus

>
> I'd first check the Pacemaker logs for "Quorum acquired" and "Quorum
> lost" messages. These indicate when Pacemaker received notifications
> from corosync. Assuming those were received properly, the DC should
> then recalculate what needs to be done, and the logs at that point
> should not have any of the messages about not having quorum.