[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Tue Dec 1 09:52:24 UTC 2015

Sorry, finger mess...sent the email too soon...

On Tue, 1 Dec 2015 10:40:40 +0100
Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:

> On Tue, 1 Dec 2015 12:34:59 +0300
> Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> 
> > On Tue, Dec 1, 2015 at 12:08 PM, Jehan-Guillaume de Rorthais
> > <jgdr at dalibo.com> wrote:
> > > On Tue, 1 Dec 2015 06:36:35 +0300
> > > Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> > >
> > >> 26.11.2015 03:52, Jehan-Guillaume de Rorthais пишет:
> > >> > Hi guys,
> > >> >
> > >> > While working on our pgsqlms agent[1], we are now studying how to
> > >> > control all the steps of a switchover process from the resource agent.
> > >> >
> > >> > The tricky part here is the 2nd step of a successful swithover with
> > >> > PostgreSQL (9.3+):
> > >> >   (1) shutdown the master first
> > >> >   (2) make sure the designated slave received **everything** from the
> > >> > old master
> > >>
> > >> I am not familiar with PG, but it sounds backwards. Once master
> > >> (replication source) is shut down, there is no way to verify anything on
> > >> slave (replication target) side.
> > >
> > > Once the master is shut down, the slave are still running, we can check
> > > whatever we want on them.
> > >
> > >> Is there any way to tell PG to "prepare to switch" and wait until it is
> > >> complete on demote?
> > >
> > > Demoting a master in PG is: shutdown -> start as slave.
> > >
> > >> Or do you mean waiting until slave finished replaying pending
> > >> replication stream? In this case I expect it should be possible to check
> > >> on slave side (something like "we have 5 files to replay left")?
> > >
> > > Yes, that is what I mean.
> > >
> > > In normal situation, the master (PG 9.3+) will wait for its standbies to
> > > receive everything, then do a "shutdown checkpoint" which is streamed to
> > > the slaves as well. At this point, slaves are aware the master did a clean
> > > shutdown.
> > >
> > > Dring a switchover, we **must** check the new master received the
> > > old-master "shutdown checkpoint". If promotion occurs before this xlog
> > > record, the old master will not be able to replicate from the new master.
> > >
> > 
> > If PG waits for soundbys to "receive everything", how is it possible
> > that slave is promoted too early? Pacemaker should wait for demote to
> > complete and demote will wait for slaves to get everything. At least
> > that what follows from your explanation. I probably miss something
> > here.
> 
> As explained below, a network issue or moving the master IP address is enough
> to break this. I has been bitten by the later during tests when setting up
> colocation without asymmetrical order (ie. promote/start IP and demote/stop
> IP).

In this situation, the master will notice its standbies left and finish its
shutdown alone.

> > > During this shutdown window, any kind of network issue or just a wrong
> > > setup (like the master IP being moved **before** the demote) will forbid
> > > a clean switchover and old master will never catchup the new one.
> > 
> > What would be the correct action in this case? Block promoting of slave?

Yes. And promote the old master.

> > I think it may be possible to use notifications here. If demoting was
> > announced and master was active at this point, you know pacemaker
> > intended to stop master and so should check for completion. Although I
> > admit I do not know which notifications are sent for failed resource
> > and for failed node.

Notifications are not good enough here. As you point out, during non-controled
switch/failover scenarios, we are not sure of what kind of notification we
will receive. Moreover, in these situation, we do want to promote the slave even
if it did not received everything from the master.

It seems pretty fragile to me to rely on notification here.