[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Fri Dec 4 09:23:13 EST 2015

On Fri, 4 Dec 2015 16:49:04 +0300
Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> On Fri, Dec 4, 2015 at 4:11 PM, Jehan-Guillaume de Rorthais
> <jgdr at dalibo.com> wrote:
> > On Wed, 2 Dec 2015 14:02:23 +1100
> > Andrew Beekhof <andrew at beekhof.net> wrote:
> >
> >>
> >> > On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais
> >> > <jgdr at dalibo.com> wrote:
> >> >
> >> > Hi guys,
> >> >
> >> > While working on our pgsqlms agent[1], we are now studying how to control
> >> > all the steps of a switchover process from the resource agent.
> >> >
> >> > The tricky part here is the 2nd step of a successful swithover with
> >> > PostgreSQL (9.3+):
> >> >  (1) shutdown the master first
> >> >  (2) make sure the designated slave received **everything** from the old
> >> > master
> >>
> >> How can you achieve (2) if (1) has already occurred?
> >
> > This check consist of validating the last transaction log entry the slave
> > received. It must be the "shutdown checkpoint" from the old master.
> >
> >> There’s no-one for the designated slave to talk to in the case of errors...
> >
> > I was explaining the steps for a successful switchover in PostgreSQL,
> > outside of Pacemaker. Sorry for the confusion if it wasn't clear enough :/
> >
> > This is currently done by hands. Should an error occurs (the
> > slave did non received the shutdown checkpoint of the master), the human
> > operator simply restart/promote the master and the slave get back to its
> > replication from it.
> >
> >> >  (3) promote the designated slave as master
> >> >  (4) start the old master as slave
> >>
> >> (4) is pretty tricky.  Assuming you use master/slave, its supposed to be in
> >> this state already after the demote in step (1).
> >
> > Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop +
> > start as slave. So after a demote, as the master actually did stopped and
> > restart as slave, the designated slave to be promoted must have the
> > "shutdown checkpoint" in its transaction log from the old master.
> >
> >> If you’re just using clones,
> >> then you’re in even more trouble because pacemaker either wouldn’t have
> >> stopped it or won’t want to start it again.
> >
> > We are using stateful clones with the master/slave role.
> > During a Pacemaker "move" (what I call a switchover), the resource is
> > demoted in the source node and promoted in destination one.  Considering a
> > demote in PostgreSQL is a stop/start(as slave), we are fine with (1) (3)
> > and (4):
> >
> > (1) the demote did stop the old master (and restarted it as slave)
> > (3) the designated slave is promoted
> > (4) the old master, connect to the new master
> >
> > About (4), as the old master is restarted as a slave in (1), it just wait to
> > be able to connect to the new master during (2) and (3) occurs. It might be
> > either the "master IP address" that finally appears or some setup in the
> > "post promote" notification, etc.
> >
> >> See more below.
> >>
> >> > As far as we understand Pacemaker, migrate-to and migrate-from
> >> > capabilities allows to distinguish if we are moving a resource because
> >> > of a failure or for a controlled switchover situation. Unfortunately,
> >> > these capabilities are ignored for cloned and multi-state resources…
> >>
> >> Yeah, this isn’t really the right use-case.
> >> You need to be looking more at the promote/demote cycle.
> >>
> >> If you turn on notifications, then in a graceful switchover (eg. the node
> >> is going into standby) you will get information about which node has been
> >> selected to become the new master when calling demote on the old master.
> >> Perhaps you could ensure (2) while performing (1).
> >
> > Our RA is already working. It already uses promote/demode notifications. See
> >
> >   https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms
> >
> > But I fail to understand how I could distinguish, even from notifications, a
> > failing scenario from a move/switchover one.
> >
> 
> Does it really matter?

Yes, it does.

> You have asynchronous replication. In case of involuntary failover you
> are bound to lose some in-flight transactions. If you accept it, I do
> not see why you care in case of voluntary failover. How is the
> situation worse than sudden host crash millisecond before you were
> ready to move master to another host?

Automatic failover != controlled switchover.

It is supposed to be a controlled switchover. You planed a maintenance window
and everything for things to go smoothly. In this situation, you are not
supposed to loose anything. In case of failure or disaster recovery, the
failover plan include a RPO. This does not mean this RPO applies during normal
operations. As far as we know how to make sure things are going smoothly
without loss of anything, we should take care of it.

What if I just need to switchover as many time as I have slaves to be able to
upgrade their OS? Should I accept to loose some transactions at each switchover?

Our RA is now working. We are trying to make it stronger and check as many
things as possible. We learned a lot from Pacemaker by working on this RA.

At the end of the day, maybe some arguments will show us our idea was the wrong
one, I am fine with this. But our goal is to contribute and give feedback to
the communities. Contributing to both Pacemaker AND PostgreSQL communities.

I still believe such a feature (ability to call migrate* for stateful clones)
will give more power to RA devs, cleaner RA code, less bugs and features++.