[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Mon Dec 7 01:11:08 UTC 2015

> On 5 Dec 2015, at 12:11 AM, Jehan-Guillaume de Rorthais <jgdr at dalibo.com> wrote:
> 
> On Wed, 2 Dec 2015 14:02:23 +1100
> Andrew Beekhof <andrew at beekhof.net> wrote:
> 
>> 
>>> On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
>>> wrote:
>>> 
>>> Hi guys,
>>> 
>>> While working on our pgsqlms agent[1], we are now studying how to control
>>> all the steps of a switchover process from the resource agent. 
>>> 
>>> The tricky part here is the 2nd step of a successful swithover with
>>> PostgreSQL (9.3+):
>>> (1) shutdown the master first
>>> (2) make sure the designated slave received **everything** from the old
>>> master
>> 
>> How can you achieve (2) if (1) has already occurred?
> 
> This check consist of validating the last transaction log entry the slave
> received. It must be the "shutdown checkpoint" from the old master.
> 
>> There’s no-one for the designated slave to talk to in the case of errors...
> 
> I was explaining the steps for a successful switchover in PostgreSQL, outside
> of Pacemaker. Sorry for the confusion if it wasn't clear enough :/
> 
> This is currently done by hands. Should an error occurs (the
> slave did non received the shutdown checkpoint of the master), the human
> operator simply restart/promote the master and the slave get back to its
> replication from it.

Why not do it as part of the promote action?
Loop until you see the checkpoint.
Thats what galera does.

You may want the on-fail=block for the promote action though.
in galera the datastore usually ends up corrupted if you stop half-way through an rsync, so we tell pacemaker to leave it alone :-(

> 
>>> (3) promote the designated slave as master
>>> (4) start the old master as slave
>> 
>> (4) is pretty tricky.  Assuming you use master/slave, its supposed to be in
>> this state already after the demote in step (1).
> 
> Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop + start as
> slave. So after a demote, as the master actually did stopped and restart as
> slave, the designated slave to be promoted must have the "shutdown checkpoint"
> in its transaction log from the old master.
> 
>> If you’re just using clones,
>> then you’re in even more trouble because pacemaker either wouldn’t have
>> stopped it or won’t want to start it again.
> 
> We are using stateful clones with the master/slave role. 
> During a Pacemaker "move" (what I call a switchover), the resource is demoted
> in the source node and promoted in destination one.  Considering a demote in
> PostgreSQL is a stop/start(as slave), we are fine with (1) (3) and (4): 
> 
> (1) the demote did stop the old master (and restarted it as slave)
> (3) the designated slave is promoted 
> (4) the old master, connect to the new master
> 
> About (4), as the old master is restarted as a slave in (1), it just wait to
> be able to connect to the new master during (2) and (3) occurs. It might be
> either the "master IP address" that finally appears or some setup in the "post
> promote" notification, etc.
> 
>> See more below.
>> 
>>> As far as we understand Pacemaker, migrate-to and migrate-from capabilities
>>> allows to distinguish if we are moving a resource because of a failure or
>>> for a controlled switchover situation. Unfortunately, these capabilities
>>> are ignored for cloned and multi-state resources…
>> 
>> Yeah, this isn’t really the right use-case.
>> You need to be looking more at the promote/demote cycle.
>> 
>> If you turn on notifications, then in a graceful switchover (eg. the node is
>> going into standby) you will get information about which node has been
>> selected to become the new master when calling demote on the old master.
>> Perhaps you could ensure (2) while performing (1).
> 
> Our RA is already working. It already uses promote/demode notifications. See
> 
>  https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms
> 
> But I fail to understand how I could distinguish, even from notifications, a
> failing scenario from a move/switchover one.
> 
> During a failure on master, Pacemaker will first try to demote it and even
> fence the node if needed. In notification, I will receive the same informations
> than during a move, isn't it?

not quite

> 
> Or maybe you think about comparing active/master/slave/stop/inactive resources
> from notification between the pre and post-demote to deduce if the old master
> is still alive as a slave [1]?

right. if its a migration, then the old master will appear in both $OCF_RESKEY_CRM_meta_notify_master_uname and $OCF_RESKEY_CRM_meta_notify_demote_uname but not $OCF_RESKEY_CRM_meta_notify_stop_uname

> In this scenario, I suppose we would have to keep
> the name of the old master in a private attribute in the designated slave to be
> promoted to compare the states of the old master?

OCF_RESKEY_CRM_meta_notify_master_uname should already have it

> 
> [1] https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt#L942
> 
>> Its not ideal, but you could have (4) happen in the post-promote notification.
>> Notify actions aren’t /supposed/ to change resource state but it has been
>> done before.
> 
> The step 4 is fine, no problem with it, no need to mess with it, again, sorry
> for the confusion.
> 
> I am sure we can probably find a workaround to this problem, but it seems to me
> it requires some struggling and wrestling in the code to bend it to what we try
> to achieve.
> 
> I thought using migrate-to/migrate-from would have been much cleaner code and
> almost self documented compare to some more conditional blocks with complex
> manipulation and computation (eg. dealing with array of nodes to compare states
> during pre/post demote).

Far from it, migrate-to/migrate-from are incredibly complex inside pacemaker.