[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Fri Dec 4 13:49:04 UTC 2015

On Fri, Dec 4, 2015 at 4:11 PM, Jehan-Guillaume de Rorthais
<jgdr at dalibo.com> wrote:
> On Wed, 2 Dec 2015 14:02:23 +1100
> Andrew Beekhof <andrew at beekhof.net> wrote:
>
>>
>> > On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais <jgdr at dalibo.com>
>> > wrote:
>> >
>> > Hi guys,
>> >
>> > While working on our pgsqlms agent[1], we are now studying how to control
>> > all the steps of a switchover process from the resource agent.
>> >
>> > The tricky part here is the 2nd step of a successful swithover with
>> > PostgreSQL (9.3+):
>> >  (1) shutdown the master first
>> >  (2) make sure the designated slave received **everything** from the old
>> > master
>>
>> How can you achieve (2) if (1) has already occurred?
>
> This check consist of validating the last transaction log entry the slave
> received. It must be the "shutdown checkpoint" from the old master.
>
>> There’s no-one for the designated slave to talk to in the case of errors...
>
> I was explaining the steps for a successful switchover in PostgreSQL, outside
> of Pacemaker. Sorry for the confusion if it wasn't clear enough :/
>
> This is currently done by hands. Should an error occurs (the
> slave did non received the shutdown checkpoint of the master), the human
> operator simply restart/promote the master and the slave get back to its
> replication from it.
>
>> >  (3) promote the designated slave as master
>> >  (4) start the old master as slave
>>
>> (4) is pretty tricky.  Assuming you use master/slave, its supposed to be in
>> this state already after the demote in step (1).
>
> Back to Pacemaker and our RA. A demote in PostgreSQL is really a stop + start as
> slave. So after a demote, as the master actually did stopped and restart as
> slave, the designated slave to be promoted must have the "shutdown checkpoint"
> in its transaction log from the old master.
>
>> If you’re just using clones,
>> then you’re in even more trouble because pacemaker either wouldn’t have
>> stopped it or won’t want to start it again.
>
> We are using stateful clones with the master/slave role.
> During a Pacemaker "move" (what I call a switchover), the resource is demoted
> in the source node and promoted in destination one.  Considering a demote in
> PostgreSQL is a stop/start(as slave), we are fine with (1) (3) and (4):
>
> (1) the demote did stop the old master (and restarted it as slave)
> (3) the designated slave is promoted
> (4) the old master, connect to the new master
>
> About (4), as the old master is restarted as a slave in (1), it just wait to
> be able to connect to the new master during (2) and (3) occurs. It might be
> either the "master IP address" that finally appears or some setup in the "post
> promote" notification, etc.
>
>> See more below.
>>
>> > As far as we understand Pacemaker, migrate-to and migrate-from capabilities
>> > allows to distinguish if we are moving a resource because of a failure or
>> > for a controlled switchover situation. Unfortunately, these capabilities
>> > are ignored for cloned and multi-state resources…
>>
>> Yeah, this isn’t really the right use-case.
>> You need to be looking more at the promote/demote cycle.
>>
>> If you turn on notifications, then in a graceful switchover (eg. the node is
>> going into standby) you will get information about which node has been
>> selected to become the new master when calling demote on the old master.
>> Perhaps you could ensure (2) while performing (1).
>
> Our RA is already working. It already uses promote/demode notifications. See
>
>   https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/script/pgsqlms
>
> But I fail to understand how I could distinguish, even from notifications, a
> failing scenario from a move/switchover one.
>

Does it really matter?

You have asynchronous replication. In case of involuntary failover you
are bound to lose some in-flight transactions. If you accept it, I do
not see why you care in case of voluntary failover. How is the
situation worse than sudden host crash millisecond before you were
ready to move master to another host?

> During a failure on master, Pacemaker will first try to demote it and even
> fence the node if needed. In notification, I will receive the same informations
> than during a move, isn't it?
>
> Or maybe you think about comparing active/master/slave/stop/inactive resources
> from notification between the pre and post-demote to deduce if the old master
> is still alive as a slave [1]? In this scenario, I suppose we would have to keep
> the name of the old master in a private attribute in the designated slave to be
> promoted to compare the states of the old master?
>
> [1] https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt#L942
>
>> Its not ideal, but you could have (4) happen in the post-promote notification.
>> Notify actions aren’t /supposed/ to change resource state but it has been
>> done before.
>
> The step 4 is fine, no problem with it, no need to mess with it, again, sorry
> for the confusion.
>
> I am sure we can probably find a workaround to this problem, but it seems to me
> it requires some struggling and wrestling in the code to bend it to what we try
> to achieve.
>
> I thought using migrate-to/migrate-from would have been much cleaner code and
> almost self documented compare to some more conditional blocks with complex
> manipulation and computation (eg. dealing with array of nodes to compare states
> during pre/post demote).
>
>
> _______________________________________________
> Developers mailing list
> Developers at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers