[ClusterLabs] Corosync quorum vs. pacemaker quorum confusion

Ken Gaillot kgaillot at redhat.com
Wed Jan 17 20:19:12 EST 2018


On Thu, 2017-12-07 at 07:33 +0300, Andrei Borzenkov wrote:
> 07.12.2017 00:28, Klaus Wenninger пишет:
> > On 12/06/2017 08:03 PM, Ken Gaillot wrote:
> > > On Sun, 2017-12-03 at 14:03 +0300, Andrei Borzenkov wrote:
> > > > I assumed that with corosync 2.x quorum is maintained by
> > > > corosync and
> > > > pacemaker simply gets yes/no. Apparently this is more
> > > > complicated.
> > > 
> > > It shouldn't be, but everything in HA-land is complicated :)
> > > 
> > > > Trivial test two node cluster (two_node is intentionally not
> > > > set to
> > > > simulate "normal" behavior).
> > > > 
> 
> ...
> > > > 
> > > > Dec 03 13:52:07 [1633] ha1       crmd:   notice:
> > > > pcmk_quorum_notification:	Quorum acquired |
> > > > membership=240
> > > > members=1
> > > > Dec 03 13:52:07 [1626] ha1 pacemakerd:   notice:
> > > > pcmk_quorum_notification:	Quorum acquired |
> > > > membership=240
> > > > members=1
> > > > 
> 
> ...
> > > > 
> > > > Confused ... is it intentional behavior or a bug?
> > > 
> > > The no-quorum-policy message above shouldn't prevent the cluster
> > > from
> > > either fencing other nodes or starting resources, once quorum is
> > > obtained from corosync. I'm not sure from the information here
> > > why that
> > > didn't happen.
> > 
> > This is obviously a cluster with 2 nodes. Is it configured as 2-
> > node
> > in corosync as well? If yes the wait-for-all-logic might be
> > confused
> > somehow.
> 
> No, as I mentioned initially I explicitly disable wait_for_all.
> 
> ha1:~ # corosync-cmapctl quorum.
> quorum.expected_votes (u32) = 2
> quorum.provider (str) = corosync_votequorum
> ha1:~ # rpm -q
> 
> > Which version of the sbd-daemon are you running?
> 
> ha1:~ # rpm -q sbd
> sbd-1.3.0-3.3.x86_64
> 
> although I do not see how exactly it matters in this case, as
> pacemaker
> never tells sbd to do anything.
> 
> > As of 1.3.1 (actually a few commits before iirc) in case of 2-node-
> > configuration in corosync sbd wouldn't rely on quorum coming from
> > corosync but rather count the nodes seen by itself. Just in case
> > you
> > see nodes suicide on loss of the disk while still being signaled
> > quorate from corosync either due to the 2-node-config or some fake
> > config as you did...
> > 
> 
> I had suicide due to no-quorum-policy=suicide.
> 
> > Regards,
> > Klaus
> > 
> > > 
> > > I'd first check the Pacemaker logs for "Quorum acquired" and
> > > "Quorum
> > > lost" messages. These indicate when Pacemaker received
> > > notifications
> > > from corosync.
> 
> As shown in original post I did have "Quorum acquired" messages.
> 
> 
> > Assuming those were received properly, the DC should
> > > then recalculate what needs to be done, and the logs at that
> > > point
> > > should not have any of the messages about not having quorum.
> 
> So I redid this using default no-quorum-policy=stop and one more
> non-stonith resource.
> 
> ...and just before I hit "send" cluster recovered. So it appears that
> "Quorum acquired" event does not trigger (immediate) re-evaluation of
> policy until some timeout.
> 
> Dec 07 07:05:16 ha1 pacemakerd[1888]:   notice: Quorum acquired
> 
> Nothing happens until the next message
> 
> Dec 07 07:12:58 ha1 crmd[1894]:   notice: State transition S_IDLE ->
> S_POLICY_ENGINE
> Dec 07 07:12:58 ha1 pengine[1893]:   notice: Watchdog will be used
> via
> SBD if fencing is required
> Dec 07 07:12:58 ha1 pengine[1893]:  warning: Scheduling Node ha2 for
> STONITH
> Dec 07 07:12:58 ha1 pengine[1893]:   notice:  * Fence (reboot) ha2
> 'node
> is unclean'
> Dec 07 07:12:58 ha1 pengine[1893]:   notice:  * Start      stonith-
> sbd
>   (   ha1 )
> Dec 07 07:12:58 ha1 pengine[1893]:   notice:  *
> Start      rsc_dummy_1
>   (   ha1 )
> 
> This is apparently 15 minutes cluster-recheck-interval timer (give or
> take):
> 
> Dec 07 06:57:35 [1888] ha1 pacemakerd:     info:
> crm_log_init:  Changed
> active directory to /var/lib/pacemaker/cores
> ...
> Dec 07 07:12:58 [1894] ha1       crmd:   notice: do_state_transition:
> State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC
> cause=C_TIMER_POPPED origin=crm_timer_popped
> 
> OK, at least we know why it happens. Whether this is intentional
> behavior is another question :)

1.1.17 had an attempted fix (commit 0b68905) for the opposite
situation, where the cluster was not reacting to quorum loss
immediately unless there were resources running.

Looking at that again, I see it was an incomplete fix.

The sequence for either quorum acquisition or loss is:
- cluster regains quorum
- corosync notifies pacemaker's crmd
- crmd updates the quorum attribute of the CIB's cib tag (it will
generally also update the node_state tag of the node that caused the
quorum change)
- cib notifies crmd when CIB write completes

At this point (XML_TAG_CIB handling in te_update_diff()), I believe we
should check the cib tag, and abort the transition if quorum changed,
but we currently don't do anything. I'm guessing it's gone unnoticed
this long because there is usually other activity happening when quorum
changes that triggers a new transition.

I'll test that theory.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list