[ClusterLabs Developers] problem with master score limited to 1000000

Tue May 19 19:27:41 EDT 2015

> On 19 May 2015, at 7:53 pm, Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:
> 
> On Tue, 28 Apr 2015 11:23:19 +0200
> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:
> 
>> On Tue, 28 Apr 2015 13:37:05 +1000
>> Andrew Beekhof <andrew at beekhof.net> wrote:
>> 
>>>> On 27 Apr 2015, at 11:10 pm, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
>>>> wrote:
>>>> 
>>>>>> A solution we were discussing with my colleague was to be able to break
>>>>>> the current transition during the pre-promote and make sure a new
>>>>>> transition is computed where pre-promote is called again.
>>> 
>>> Realistically, this is not going to happen in the next few years.
>>> 
>>> Regardless of the idea’s merits, its a major change to one of our core
>>> assumptions. Beyond the initial implementation, the fallout will last for
>>> months and I just don’t have that kind of bandwidth.
>> 
>> Well, we were looking for a solution with the current implementation of
>> Pacemaker anyway :)
>> 
>> If it's not possible to gently tell to the CRM that it should call pre-promote
>> again, then breaking the transition roughly is fine enough for us.
> 
> We tried to complete the whole election process in only one call of
> pre-promote. During the call of pre-promote, the node-to-be-promoted is in
> charge to connect to all other postgresql instances to check if there is a
> better candidate. If it found a better one, it changes the scores calling
> crm_master.
> 
> It kinda worked, but not as fast as we hoped. This PoC showed that the
> transition was broken AFTER the first promotion, not after the pre-promote
> action were all collected.

Sounds about right.
Thats why I wasn’t suggesting this as a foolproof approach - because you don’t get precise control over where processing stops. 

> Thus, slave1 being the lagging slave and slave2 the
> best candidate, we had:
> 
>  * slave1 promoted
>  * slave1 demoted
>  * slave 2 promoted
> 
> This is actually a really bad scenario for us. We might still have the log
> files and transition files.
> 
> Is it because the crm_master was called from the designated
> node-to-be-promoted ?

Nope. Its because there is scope for lots of things to happen between the update being sent and noticed.
Its also possible that the transition is hard-wired to run action X if the action X_pre_notifys were invoked.

> 
> Is it possible to make sure the transition breakage happen as soon as the
> score change ? 

The only way to guarantee it is to allow notifications to fail.

> 
> Looking at the mysql RA, it seems they set the master score (at least) from the
> pre-promote notify action as well.
> 
>> So, what exactly is a transient attribute? How could we create or set such
>> attribute? Is it possible?
>> 
>>> The idea is that by doing it in the monitor[1] op, you ensure you’re always
>>> in a position to do a promotion.
>>> By all means query attrd from the promote and/or pre-promote operations to
>>> ensure that the chosen node is still the correct one though.
>> 
>> We are unsure about the difference between, querying/setting an attribute
>> using crm_attribute and querying/setting a attribute with attrd. 
>> 
>> what is the difference? How to make sure all the node updated their attribute
>> before taking a decision? How to set/query an attribute in attrd?
>> attr_updater?
>> 
>>> Give the pre-promote a decent timeout and it can also act as your "waiting
>>> for writes to come in and all LSNs to be updated” buffer.
>>> 
>>> 
>>> [1] Strictly speaking, it could be any action name you dream up and tell the
>>> cluster to call on a recurring basis. Given that monitor is already defined
>>> and being called repeatedly, most people take the path of least resistance
>>> and use that (one less thing for an admin to mess up).
>> 
>> Our main goal was to keep the promotion negotiation going as long as the
>> slaves did not agree with each others about who is the new master, without
>> interruption. Without waiting for another round of monitor.
> 
> 
> Regards,
> -- 
> Jehan-Guillaume de Rorthais
> Dalibo
> http://www.dalibo.com