[ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

Tue Dec 1 10:18:07 UTC 2015

On Mon, 30 Nov 2015 15:04:37 -0600
Ken Gaillot <kgaillot at redhat.com> wrote:

> On 11/25/2015 06:52 PM, Jehan-Guillaume de Rorthais wrote:
> > Hi guys,
> > 
> > While working on our pgsqlms agent[1], we are now studying how to control
> > all the steps of a switchover process from the resource agent. 
> > 
> > The tricky part here is the 2nd step of a successful swithover with
> > PostgreSQL (9.3+):
> >   (1) shutdown the master first
> >   (2) make sure the designated slave received **everything** from the old
> > master (3) promote the designated slave as master
> >   (4) start the old master as slave
> > 
> > As far as we understand Pacemaker, migrate-to and migrate-from capabilities
> > allows to distinguish if we are moving a resource because of a failure or
> > for a controlled switchover situation. Unfortunately, these capabilities
> > are ignored for cloned and multi-state resources...
> > 
> > Because of this restriction, we currently don't know from the resource agent
> > code if we should check the designated slave received everything from the
> > old master (controlled switchover) or not (we lost the master). In case of
> > controlled switchover, if the designated slave did not received everything
> > from the master, we must abort the switchover.
> > 
> > A workaround we could imagine would be to set a special cluster attribute
> > manually (using crm_attribute) to signal the agent we are going to make a
> > controlled switchover.
> > 
> > But I bet the cleaner way would be to use migrate-to and migrate-from
> > capabilities. Did we miss something about them? Is there some plan to
> > support moving a Master/Slave role using migrate-to and migrate-from at
> > some point? Any other proposal? ideas?
> > 
> > [1] see "multistate" folder in
> > https://github.com/dalibo/pgsql-resource-agent
> 
> Per the documentation, clones can't migrate:
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_migrating_resources.html

Yes, we were aware of this. We did a quick test because we were wondering if
this restriction was only about anonymous clones (which make sense) or all of
them, including unique and stateful (too bad for us :)).

> It would be nice to support migration for globally unique clones, and
> the master role of stateful clones. Feel free to submit a feature
> request with what you think the interface should look like.

Thanks, I will.

> The attribute approach is interesting, but it would be limited to moves
> initiated outside the cluster,

sure

> and I suspect error handling would be problematic (what if someone forgets to
> unset the attribute? what if one part of the process fails?).

Exact, this is tricky. But as the controlled failover is initiated by humans,
if somethings goes wrong the attribute should be removed by hands. If the
switchover succeed, the RA can remove it itself at the end of the migrate-from.

> I'm not sure how other db RAs deal with the situation; that would be
> worth looking into.

Oracle RA is not stateful. 

MySQL seems to have a different promote/demote constraints. After a quick look,
it seems to me they deal with real demote/promote during notifications. I'm not
sure this RA try to deal with clean switchover, neither it needs to.