[ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

Ken Gaillot kgaillot at redhat.com
Thu Jan 18 16:41:52 EST 2018


On Wed, 2018-01-17 at 19:19 -0600, Ken Gaillot wrote:
> On Thu, 2017-12-07 at 07:33 +0300, Andrei Borzenkov wrote:
> > 07.12.2017 00:28, Klaus Wenninger пишет:
> > > On 12/06/2017 08:03 PM, Ken Gaillot wrote:
> > > > On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote:
> > > > > I assumed that with corosync 2.x quorum is maintained by
> > > > > corosync and
> > > > > pacemaker simply gets yes/no. Apparently this is more
> > > > > complicated.
> > > > 
> > > > It shouldn't be, but everything in HA-land is complicated :)
> > > > 
> > > > > Trivial test two node cluster (two_node is intentionally not
> > > > > set to
> > > > > simulate "normal" behavior).
> > > > > 
> > 
> > ...
> > > > > 
> > > > > Dec 03 13:52:07 [1633] ha1       crmd:   notice:
> > > > > pcmk_quorum_notification:	Quorum acquired |
> > > > > membership=240
> > > > > members=1
> > > > > Dec 03 13:52:07 [1626] ha1 pacemakerd:   notice:
> > > > > pcmk_quorum_notification:	Quorum acquired |
> > > > > membership=240
> > > > > members=1
> > > > > 
> > 
> > ...
> > > > > 
> > > > > Confused ... is it intentional behavior or a bug?
> > > > 
> > > > The no-quorum-policy message above shouldn't prevent the
> > > > cluster
> > > > from
> > > > either fencing other nodes or starting resources, once quorum
> > > > is
> > > > obtained from corosync. I'm not sure from the information here
> > > > why that
> > > > didn't happen.
> > > 
> > > This is obviously a cluster with 2 nodes. Is it configured as 2-
> > > node
> > > in corosync as well? If yes the wait-for-all-logic might be
> > > confused
> > > somehow.
> > 
> > No, as I mentioned initially I explicitly disable wait_for_all.
> > 
> > ha1:~ # corosync-cmapctl quorum.
> > quorum.expected_votes (u32) = 2
> > quorum.provider (str) = corosync_votequorum
> > ha1:~ # rpm -q
> > 
> > > Which version of the sbd-daemon are you running?
> > 
> > ha1:~ # rpm -q sbd
> > sbd-1.3.0-3.3.x86_64
> > 
> > although I do not see how exactly it matters in this case, as
> > pacemaker
> > never tells sbd to do anything.
> > 
> > > As of 1.3.1 (actually a few commits before iirc) in case of 2-
> > > node-
> > > configuration in corosync sbd wouldn't rely on quorum coming from
> > > corosync but rather count the nodes seen by itself. Just in case
> > > you
> > > see nodes suicide on loss of the disk while still being signaled
> > > quorate from corosync either due to the 2-node-config or some
> > > fake
> > > config as you did...
> > > 
> > 
> > I had suicide due to no-quorum-policy=suicide.
> > 
> > > Regards,
> > > Klaus
> > > 
> > > > 
> > > > I'd first check the Pacemaker logs for "Quorum acquired" and
> > > > "Quorum
> > > > lost" messages. These indicate when Pacemaker received
> > > > notifications
> > > > from corosync.
> > 
> > As shown in original post I did have "Quorum acquired" messages.
> > 
> > 
> > > Assuming those were received properly, the DC should
> > > > then recalculate what needs to be done, and the logs at that
> > > > point
> > > > should not have any of the messages about not having quorum.
> > 
> > So I redid this using default no-quorum-policy=stop and one more
> > non-stonith resource.
> > 
> > ...and just before I hit "send" cluster recovered. So it appears
> > that
> > "Quorum acquired" event does not trigger (immediate) re-evaluation
> > of
> > policy until some timeout.
> > 
> > Dec 07 07:05:16 ha1 pacemakerd[1888]:   notice: Quorum acquired
> > 
> > Nothing happens until the next message
> > 
> > Dec 07 07:12:58 ha1 crmd[1894]:   notice: State transition S_IDLE
> > ->
> > S_POLICY_ENGINE
> > Dec 07 07:12:58 ha1 pengine[1893]:   notice: Watchdog will be used
> > via
> > SBD if fencing is required
> > Dec 07 07:12:58 ha1 pengine[1893]:  warning: Scheduling Node ha2
> > for
> > STONITH
> > Dec 07 07:12:58 ha1 pengine[1893]:   notice:  * Fence (reboot) ha2
> > 'node
> > is unclean'
> > Dec 07 07:12:58 ha1 pengine[1893]:   notice:  * Start      stonith-
> > sbd
> >   (   ha1 )
> > Dec 07 07:12:58 ha1 pengine[1893]:   notice:  *
> > Start      rsc_dummy_1
> >   (   ha1 )
> > 
> > This is apparently 15 minutes cluster-recheck-interval timer (give
> > or
> > take):
> > 
> > Dec 07 06:57:35 [1888] ha1 pacemakerd:     info:
> > crm_log_init:  Changed
> > active directory to /var/lib/pacemaker/cores
> > ...
> > Dec 07 07:12:58 [1894] ha1       crmd:   notice:
> > do_state_transition:
> > State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC
> > cause=C_TIMER_POPPED origin=crm_timer_popped
> > 
> > OK, at least we know why it happens. Whether this is intentional
> > behavior is another question :)
> 
> 1.1.17 had an attempted fix (commit 0b68905) for the opposite
> situation, where the cluster was not reacting to quorum loss
> immediately unless there were resources running.
> 
> Looking at that again, I see it was an incomplete fix.
> 
> The sequence for either quorum acquisition or loss is:
> - cluster regains quorum
> - corosync notifies pacemaker's crmd
> - crmd updates the quorum attribute of the CIB's cib tag (it will
> generally also update the node_state tag of the node that caused the
> quorum change)
> - cib notifies crmd when CIB write completes
> 
> At this point (XML_TAG_CIB handling in te_update_diff()), I believe
> we
> should check the cib tag, and abort the transition if quorum changed,
> but we currently don't do anything. I'm guessing it's gone unnoticed
> this long because there is usually other activity happening when
> quorum
> changes that triggers a new transition.
> 
> I'll test that theory.

On further investigation, this was an artifact of how you reached
quorum. When quorum is acquired normally, by a node joining the
cluster, a new transition is triggered by the node join. However, when
artificially inducing quorum via corosync, there is no node join, and
thus no transition.

I do think it would be more robust if we started a new transition on
any quorum change, so I'll still try that approach, but that explains
why this isn't normally an issue.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list