[Pacemaker] Adding VIP support for the MySQL RA

Sun Nov 13 03:05:22 CET 2011

On 11-11-12 03:56 PM, Michael Marrotte wrote:
> An N+1 or N+X topology might be good for that cascading scenario...  
> Find a sweet spot for the evict date.
> If slave are lagging too much,  scale and tune.

Hi,

That's plainly not always possible since MySQL replication is single 
threaded.  That's _the_ major issue with MySQL replication and although 
it is a pretty hot topic, nobody has really succeeded in addressing this 
problem.

>
> I haven't read Yves' patch, but I'll check it out.  I just saw that he 
> was looking for slave to work with VIP and suggested a couple ways 
> I've seen it work.

The way you suggested will work well for a relatively low load database 
system but since servers are stopped when they lag too much, the larger 
and busier installation cannot accept that.  Sometimes _one_ bad query 
hitting a slave can cause it to lag because of locking and/or intensive 
disk IO.  To counter that, people use many slaves.

>
> On Sat, Nov 12, 2011 at 2:51 PM, Florian Haas <florian at hastexo.com 
> <mailto:florian at hastexo.com>> wrote:
>
>     Hi Yves and Michael,
>
>     On 2011-11-12 19:22, Yves Trudeau wrote:
>     > lol... How many large databases have you managed?  Once evicted,
>     MySQL
>     > will be restarted by Pacemaker so all the caches will be cold.
>
>     If I may say so, before you start laughing at people on the list,
>     it may
>     be a good idea to actually get your facts straight and check what
>     evict_outdated_slaves does. For a too-far-behind slave it bails out of
>     monitor with $OCF_ERR_INSTALLED, which Pacemaker considers a hard
>     error.
>     Thus, that instance will _not_ be restarted by Pacemaker on this node
>     unless an administrator intervenes.
>
>     Still, Michael, Yves has a point that evict_outdated_slaves is not
>     optimal (and I'm saying this as the guy that wrote that part of the
>     agent). It's fine for a temporary problem that affects a single slave,
>     but please consider this scenario:
>
>     - High load on the database, across several instances.
>     - Slaves start lagging behind.
>     - We shut down a slave that is too far behind.
>     - We now have _fewer_ instances to handle the same load.
>     - Slaves fall further behind.
>     - We shut down more slaves.
>
>     This can turn into a cascading failure. Note, specifically, that the
>     lagging slave has no real option to catch up even when the database
>     isn't being hammered anymore, unless an admin has intervened and
>     recovered/restarted the instance manually. And, of course, Yves' point
>     about cold caches is entirely valid.
>
>     In Yves' approach, we wouldn't shut down MySQL, but merely shift away
>     the slave's virtual IP. So while clients can't connect to the
>     slave via
>     its virtual IP anymore, the slave can still fetch updates from the
>     master -- and thus, actually has a chance to catch up. Once it's
>     sufficiently caught up, it gets the VIP back and clients can talk to
>     that slave again. And since we never stopped MySQL, we also don't have
>     the cold cache problem.
>
>     Yves' patches are not perfect (and they're not expected to be, that's
>     what a review is for), but I think his approach is sound and shouldn't
>     be shot down simply because evict_outdated_slaves is already there.
>
>     Cheers,
>     Florian
>
>     --
>     Need help with High Availability?
>     http://www.hastexo.com/now
>
>     _______________________________________________
>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>     <mailto:Pacemaker at oss.clusterlabs.org>
>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>     Project Home: http://www.clusterlabs.org
>     Getting started:
>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs:
>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20111112/b2785168/attachment-0001.html>