[Pacemaker] Primitive stuck after resource agent failure?

Serge Dubrouski sergeyfd at gmail.com
Mon Feb 21 11:01:25 EST 2011


On Mon, Feb 21, 2011 at 5:30 AM, Lars Marowsky-Bree <lmb at novell.com> wrote:
> On 2011-02-18T12:28:31, Jody McIntyre <jodym at trustcentric.com> wrote:
>
>> I considered this, but unfortunately it would take a lot of effort.  The
>> existing "pgsql" resource agent is designed to start and stop postgres,
>> whereas in this mode I want to switch an already running postgres into
>> master mode.  Even worse, once postgres is in master mode, there is no
>> automated way to switch it back to standby mode (and writing one would
>> require considerable effort.)
>
> Well, you don't need it. I assume there's still a way to stop it,
> though, and a failed demote will result in a stop.

There was a patch from Takatoshi MATSUO that was doing exactly that:
completely stopping Postgres on "demote" operation. I am not sure that
this is a good approach for implementing M/S functionality. Such
approach works well for disaster conditions when you don't care what
happens to the failed master but doesn't work well for maintenance
conditions when you just need to move master role from one node to
another.

>
> So adding the master/slave functionality to the pg agent seems to make
> sense still.

Sure it is. The main question is how.

>
> However, if there are fundamental conceptual problems with the m/s
> implementation of our RA and the PG approach (and perhaps mysql?) we may
> want to also extent the design of the m/s. Got any thoughts on that?

The main problem with PG M/S as I see it is that whenever failover
happens. meaning promoting slave to master role using trigger file,
new master increases Timeline parameter, kind of incarnation number
for the database. After that the only way to make old master to be a
slave of new master is to restore it form a freshly made backup of new
master. I didn't have a chance to check what impact this changing of
Timeline creates on other slaves if there any active ones at the time
of failover.

What's interesting though, is that if you don't use PG trigger file to
switch roles and simply stop/start PG along with switching roles
(removing/recreating restore.conf file) the increase of Timeline
doesn't happen and switching roles works well. Definitely it takes
longer to promote new master but provides more flexibility. May be we
should implement M/S for PG in that way?

>
>
> Regards,
>    Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Serge Dubrouski.




More information about the Pacemaker mailing list