[ClusterLabs Developers] problem with master score limited to 1000000

Tue May 19 09:53:20 UTC 2015

On Tue, 28 Apr 2015 11:23:19 +0200
Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:

> On Tue, 28 Apr 2015 13:37:05 +1000
> Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> > > On 27 Apr 2015, at 11:10 pm, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> > > wrote:
> > > 
> > >>> A solution we were discussing with my colleague was to be able to break
> > >>> the current transition during the pre-promote and make sure a new
> > >>> transition is computed where pre-promote is called again.
> > 
> > Realistically, this is not going to happen in the next few years.
> > 
> > Regardless of the idea’s merits, its a major change to one of our core
> > assumptions. Beyond the initial implementation, the fallout will last for
> > months and I just don’t have that kind of bandwidth.
> 
> Well, we were looking for a solution with the current implementation of
> Pacemaker anyway :)
> 
> If it's not possible to gently tell to the CRM that it should call pre-promote
> again, then breaking the transition roughly is fine enough for us.

We tried to complete the whole election process in only one call of
pre-promote. During the call of pre-promote, the node-to-be-promoted is in
charge to connect to all other postgresql instances to check if there is a
better candidate. If it found a better one, it changes the scores calling
crm_master.

It kinda worked, but not as fast as we hoped. This PoC showed that the
transition was broken AFTER the first promotion, not after the pre-promote
action were all collected. Thus, slave1 being the lagging slave and slave2 the
best candidate, we had:

  * slave1 promoted
  * slave1 demoted
  * slave 2 promoted

This is actually a really bad scenario for us. We might still have the log
files and transition files.

Is it because the crm_master was called from the designated
node-to-be-promoted ?

Is it possible to make sure the transition breakage happen as soon as the
score change ? 

Looking at the mysql RA, it seems they set the master score (at least) from the
pre-promote notify action as well.

> So, what exactly is a transient attribute? How could we create or set such
> attribute? Is it possible?
> 
> > The idea is that by doing it in the monitor[1] op, you ensure you’re always
> > in a position to do a promotion.
> > By all means query attrd from the promote and/or pre-promote operations to
> > ensure that the chosen node is still the correct one though.
> 
> We are unsure about the difference between, querying/setting an attribute
> using crm_attribute and querying/setting a attribute with attrd. 
> 
> what is the difference? How to make sure all the node updated their attribute
> before taking a decision? How to set/query an attribute in attrd?
> attr_updater?
> 
> > Give the pre-promote a decent timeout and it can also act as your "waiting
> > for writes to come in and all LSNs to be updated” buffer.
> > 
> > 
> > [1] Strictly speaking, it could be any action name you dream up and tell the
> > cluster to call on a recurring basis. Given that monitor is already defined
> > and being called repeatedly, most people take the path of least resistance
> > and use that (one less thing for an admin to mess up).
> 
> Our main goal was to keep the promotion negotiation going as long as the
> slaves did not agree with each others about who is the new master, without
> interruption. Without waiting for another round of monitor.

Regards,
-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com