[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Fri Dec 4 08:11:10 EST 2015

On Wed, 2 Dec 2015 14:02:23 +1100
Andrew Beekhof <andrew at beekhof.net> wrote:

> 
> > On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
> > wrote:
> > 
> > Hi guys,
> > 
> > While working on our pgsqlms agent[1], we are now studying how to control
> > all the steps of a switchover process from the resource agent. 
> > 
> > The tricky part here is the 2nd step of a successful swithover with
> > PostgreSQL (9.3+):
> >  (1) shutdown the master first
> >  (2) make sure the designated slave received **everything** from the old
> > master
> 
> How can you achieve (2) if (1) has already occurred?

This check consist of validating the last transaction log entry the slave
received. It must be the "shutdown checkpoint" from the old master.

> There’s no-one for the designated slave to talk to in the case of errors...

I was explaining the steps for a successful switchover in PostgreSQL, outside
of Pacemaker. Sorry for the confusion if it wasn't clear enough :/

This is currently done by hands. Should an error occurs (the
slave did non received the shutdown checkpoint of the master), the human
operator simply restart/promote the master and the slave get back to its
replication from it.

> >  (3) promote the designated slave as master
> >  (4) start the old master as slave
> 
> (4) is pretty tricky.  Assuming you use master/slave, its supposed to be in
> this state already after the demote in step (1).

Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop + start as
slave. So after a demote, as the master actually did stopped and restart as
slave, the designated slave to be promoted must have the "shutdown checkpoint"
in its transaction log from the old master.

> If you’re just using clones,
> then you’re in even more trouble because pacemaker either wouldn’t have
> stopped it or won’t want to start it again.

We are using stateful clones with the master/slave role. 
During a Pacemaker "move" (what I call a switchover), the resource is demoted
in the source node and promoted in destination one.  Considering a demote in
PostgreSQL is a stop/start(as slave), we are fine with (1) (3) and (4): 

(1) the demote did stop the old master (and restarted it as slave)
(3) the designated slave is promoted 
(4) the old master, connect to the new master

About (4), as the old master is restarted as a slave in (1), it just wait to
be able to connect to the new master during (2) and (3) occurs. It might be
either the "master IP address" that finally appears or some setup in the "post
promote" notification, etc.

> See more below.
> 
> > As far as we understand Pacemaker, migrate-to and migrate-from capabilities
> > allows to distinguish if we are moving a resource because of a failure or
> > for a controlled switchover situation. Unfortunately, these capabilities
> > are ignored for cloned and multi-state resources…
> 
> Yeah, this isn’t really the right use-case.
> You need to be looking more at the promote/demote cycle.
> 
> If you turn on notifications, then in a graceful switchover (eg. the node is
> going into standby) you will get information about which node has been
> selected to become the new master when calling demote on the old master.
> Perhaps you could ensure (2) while performing (1).

Our RA is already working. It already uses promote/demode notifications. See

  https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms

But I fail to understand how I could distinguish, even from notifications, a
failing scenario from a move/switchover one.

During a failure on master, Pacemaker will first try to demote it and even
fence the node if needed. In notification, I will receive the same informations
than during a move, isn't it?

Or maybe you think about comparing active/master/slave/stop/inactive resources
from notification between the pre and post-demote to deduce if the old master
is still alive as a slave [1]? In this scenario, I suppose we would have to keep
the name of the old master in a private attribute in the designated slave to be
promoted to compare the states of the old master?

[1] https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt#L942

> Its not ideal, but you could have (4) happen in the post-promote notification.
> Notify actions aren’t /supposed/ to change resource state but it has been
> done before.

The step 4 is fine, no problem with it, no need to mess with it, again, sorry
for the confusion.

I am sure we can probably find a workaround to this problem, but it seems to me
it requires some struggling and wrestling in the code to bend it to what we try
to achieve.

I thought using migrate-to/migrate-from would have been much cleaner code and
almost self documented compare to some more conditional blocks with complex
manipulation and computation (eg. dealing with array of nodes to compare states
during pre/post demote).