[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Fri Dec 11 07:40:05 UTC 2015

(Sorry for the top post)

Thank you all for your time, answers and advices. They are much appreciated.

I have no bandwidth right now to process your inputs, for some days / weeks
(work and moving away to a new city) and my colleague is overwhelmed as well :(

We'll get back to the list soon with some feedback about our attempt to
implement your advices.

Season's greetings all!

Le Wed, 9 Dec 2015 18:04:47 -0600,
Ken Gaillot <kgaillot at redhat.com> a écrit :

> On 12/08/2015 05:52 AM, Andrei Borzenkov wrote:
> > On Fri, Dec 4, 2015 at 4:11 PM, Jehan-Guillaume de Rorthais
> > <jgdr at dalibo.com> wrote:
> > 
> >>
> >> But I fail to understand how I could distinguish, even from notifications,
> >> a failing scenario from a move/switchover one.
> >>
> > 
> > On demote fetch current log position and store it in cluster
> > attribute. On promote fetch previous master position, wait until
> > current instance caught up and delete attribute. If attribute is not
> > present on promote, master was down so do not wait and proceed.
> > 
> > If you set transient attribute, cluster will forget about previous
> > master on restart. If you set persistent attribute, it will allow you
> > to ensure no data loss has (automatically) occurred even on cluster
> > restart.
> > 
> > Where do you envision problems here?
> 
> This is more or less what was suggested in the original post :) and
> after discussing this some more, I tend to agree with this approach
> (using an attribute, as opposed to clone notifications, or the proposed
> migration support for the master role).
> 
> The demote action would set an attribute. It would be best to use a
> private attribute (attrd_updater --private --update), so setting it
> doesn't trigger further pacemaker activity. Since the attribute is set
> by demote, it will work whether the move is initiated by the cluster or
> externally (by a sysadmin). To initiate it manually, you can set a
> negative location constraint for the master role on the current master.
> 
> The promote action would check for that attribute (attrd_updater
> --private --query --all). If it exists, then it's an orderly handover,
> and it should wait for the replication checkpoint. On success, remove
> the attribute. There should be a timeout on the waiting (less than the
> timeout for the promote operation as a whole), for when there is a
> network issue during the transfer. You could decide whether timeout
> means "grab the master role immediately" or "fail the promote".
> 
> I do see the logical appeal of migrate_to/migrate_from for the master
> role, but that would be a long-term project.