[ClusterLabs Developers] problem with master score limited to 1000000

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Tue May 19 05:53:20 EDT 2015

On Tue, 28 Apr 2015 11:23:19 +0200
Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:

> On Tue, 28 Apr 2015 13:37:05 +1000
> Andrew Beekhof <andrew at beekhof.net> wrote:
> > > On 27 Apr 2015, at 11:10 pm, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> > > wrote:
> > > 
> > >>> A solution we were discussing with my colleague was to be able to break
> > >>> the current transition during the pre-promote and make sure a new
> > >>> transition is computed where pre-promote is called again.
> > 
> > Realistically, this is not going to happen in the next few years.
> > 
> > Regardless of the idea’s merits, its a major change to one of our core
> > assumptions. Beyond the initial implementation, the fallout will last for
> > months and I just don’t have that kind of bandwidth.
> Well, we were looking for a solution with the current implementation of
> Pacemaker anyway :)
> If it's not possible to gently tell to the CRM that it should call pre-promote
> again, then breaking the transition roughly is fine enough for us.

We tried to complete the whole election process in only one call of
pre-promote. During the call of pre-promote, the node-to-be-promoted is in
charge to connect to all other postgresql instances to check if there is a
better candidate. If it found a better one, it changes the scores calling

It kinda worked, but not as fast as we hoped. This PoC showed that the
transition was broken AFTER the first promotion, not after the pre-promote
action were all collected. Thus, slave1 being the lagging slave and slave2 the
best candidate, we had:

  * slave1 promoted
  * slave1 demoted
  * slave 2 promoted

This is actually a really bad scenario for us. We might still have the log
files and transition files.

Is it because the crm_master was called from the designated
node-to-be-promoted ?

Is it possible to make sure the transition breakage happen as soon as the
score change ? 

Looking at the mysql RA, it seems they set the master score (at least) from the
pre-promote notify action as well.

> So, what exactly is a transient attribute? How could we create or set such
> attribute? Is it possible?
> > The idea is that by doing it in the monitor[1] op, you ensure you’re always
> > in a position to do a promotion.
> > By all means query attrd from the promote and/or pre-promote operations to
> > ensure that the chosen node is still the correct one though.
> We are unsure about the difference between, querying/setting an attribute
> using crm_attribute and querying/setting a attribute with attrd. 
> what is the difference? How to make sure all the node updated their attribute
> before taking a decision? How to set/query an attribute in attrd?
> attr_updater?
> > Give the pre-promote a decent timeout and it can also act as your "waiting
> > for writes to come in and all LSNs to be updated” buffer.
> > 
> > 
> > [1] Strictly speaking, it could be any action name you dream up and tell the
> > cluster to call on a recurring basis. Given that monitor is already defined
> > and being called repeatedly, most people take the path of least resistance
> > and use that (one less thing for an admin to mess up).
> Our main goal was to keep the promotion negotiation going as long as the
> slaves did not agree with each others about who is the new master, without
> interruption. Without waiting for another round of monitor.

Jehan-Guillaume de Rorthais

More information about the Developers mailing list