[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Tue Dec 1 04:34:59 EST 2015

On Tue, Dec 1, 2015 at 12:08 PM, Jehan-Guillaume de Rorthais
<jgdr at dalibo.com> wrote:
> On Tue, 1 Dec 2015 06:36:35 +0300
> Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>
>> 26.11.2015 03:52, Jehan-Guillaume de Rorthais пишет:
>> > Hi guys,
>> >
>> > While working on our pgsqlms agent[1], we are now studying how to control
>> > all the steps of a switchover process from the resource agent.
>> >
>> > The tricky part here is the 2nd step of a successful swithover with
>> > PostgreSQL (9.3+):
>> >   (1) shutdown the master first
>> >   (2) make sure the designated slave received **everything** from the old
>> > master
>>
>> I am not familiar with PG, but it sounds backwards. Once master
>> (replication source) is shut down, there is no way to verify anything on
>> slave (replication target) side.
>
> Once the master is shut down, the slave are still running, we can check
> whatever we want on them.
>
>> Is there any way to tell PG to "prepare to switch" and wait until it is
>> complete on demote?
>
> Demoting a master in PG is: shutdown -> start as slave.
>
>> Or do you mean waiting until slave finished replaying pending
>> replication stream? In this case I expect it should be possible to check
>> on slave side (something like "we have 5 files to replay left")?
>
> Yes, that is what I mean.
>
> In normal situation, the master (PG 9.3+) will wait for its standbies to receive
> everything, then do a "shutdown checkpoint" which is streamed to the slaves as
> well. At this point, slaves are aware the master did a clean shutdown.
>
> Dring a switchover, we **must** check the new master received the old-master
> "shutdown checkpoint". If promotion occurs before this xlog record, the old
> master will not be able to replicate from the new master.
>

If PG waits for soundbys to "receive everything", how is it possible
that slave is promoted too early? Pacemaker should wait for demote to
complete and demote will wait for slaves to get everything. At least
that what follows from your explanation. I probably miss something
here.

> During this shutdown window, any kind of network issue or just a wrong setup
> (like the master IP being moved **before** the demote) will forbid a clean
> switchover and old master will never catchup the new one.

What would be the correct action in this case? Block promoting of slave?

I think it may be possible to use notifications here. If demoting was
announced and master was active at this point, you know pacemaker
intended to stop master and so should check for completion. Although I
admit I do not know which notifications are sent for failed resource
and for failed node.