[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Tue Dec 1 09:40:40 UTC 2015

On Tue, 1 Dec 2015 12:34:59 +0300
Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> On Tue, Dec 1, 2015 at 12:08 PM, Jehan-Guillaume de Rorthais
> <jgdr at dalibo.com> wrote:
> > On Tue, 1 Dec 2015 06:36:35 +0300
> > Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> >
> >> 26.11.2015 03:52, Jehan-Guillaume de Rorthais пишет:
> >> > Hi guys,
> >> >
> >> > While working on our pgsqlms agent[1], we are now studying how to control
> >> > all the steps of a switchover process from the resource agent.
> >> >
> >> > The tricky part here is the 2nd step of a successful swithover with
> >> > PostgreSQL (9.3+):
> >> >   (1) shutdown the master first
> >> >   (2) make sure the designated slave received **everything** from the old
> >> > master
> >>
> >> I am not familiar with PG, but it sounds backwards. Once master
> >> (replication source) is shut down, there is no way to verify anything on
> >> slave (replication target) side.
> >
> > Once the master is shut down, the slave are still running, we can check
> > whatever we want on them.
> >
> >> Is there any way to tell PG to "prepare to switch" and wait until it is
> >> complete on demote?
> >
> > Demoting a master in PG is: shutdown -> start as slave.
> >
> >> Or do you mean waiting until slave finished replaying pending
> >> replication stream? In this case I expect it should be possible to check
> >> on slave side (something like "we have 5 files to replay left")?
> >
> > Yes, that is what I mean.
> >
> > In normal situation, the master (PG 9.3+) will wait for its standbies to
> > receive everything, then do a "shutdown checkpoint" which is streamed to
> > the slaves as well. At this point, slaves are aware the master did a clean
> > shutdown.
> >
> > Dring a switchover, we **must** check the new master received the old-master
> > "shutdown checkpoint". If promotion occurs before this xlog record, the old
> > master will not be able to replicate from the new master.
> >
> 
> If PG waits for soundbys to "receive everything", how is it possible
> that slave is promoted too early? Pacemaker should wait for demote to
> complete and demote will wait for slaves to get everything. At least
> that what follows from your explanation. I probably miss something
> here.

As explained below, a network issue or moving the master IP address is enough
to break this. I has been bitten by the later during tests when setting up
colocation without asymmetrical order (ie. promote/start IP and demote/stop IP).

> > During this shutdown window, any kind of network issue or just a wrong setup
> > (like the master IP being moved **before** the demote) will forbid a clean
> > switchover and old master will never catchup the new one.
> 
> What would be the correct action in this case? Block promoting of slave?
> 
> I think it may be possible to use notifications here. If demoting was
> announced and master was active at this point, you know pacemaker
> intended to stop master and so should check for completion. Although I
> admit I do not know which notifications are sent for failed resource
> and for failed node.

-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com