[Pacemaker] Adding VIP support for the MySQL RA
Yves Trudeau
y.trudeau at videotron.ca
Sun Nov 13 03:05:22 CET 2011
On 11-11-12 03:56 PM, Michael Marrotte wrote:
> An N+1 or N+X topology might be good for that cascading scenario...
> Find a sweet spot for the evict date.
> If slave are lagging too much, scale and tune.
Hi,
That's plainly not always possible since MySQL replication is single
threaded. That's _the_ major issue with MySQL replication and although
it is a pretty hot topic, nobody has really succeeded in addressing this
problem.
>
> I haven't read Yves' patch, but I'll check it out. I just saw that he
> was looking for slave to work with VIP and suggested a couple ways
> I've seen it work.
The way you suggested will work well for a relatively low load database
system but since servers are stopped when they lag too much, the larger
and busier installation cannot accept that. Sometimes _one_ bad query
hitting a slave can cause it to lag because of locking and/or intensive
disk IO. To counter that, people use many slaves.
>
> On Sat, Nov 12, 2011 at 2:51 PM, Florian Haas <florian at hastexo.com
> <mailto:florian at hastexo.com>> wrote:
>
> Hi Yves and Michael,
>
> On 2011-11-12 19:22, Yves Trudeau wrote:
> > lol... How many large databases have you managed? Once evicted,
> MySQL
> > will be restarted by Pacemaker so all the caches will be cold.
>
> If I may say so, before you start laughing at people on the list,
> it may
> be a good idea to actually get your facts straight and check what
> evict_outdated_slaves does. For a too-far-behind slave it bails out of
> monitor with $OCF_ERR_INSTALLED, which Pacemaker considers a hard
> error.
> Thus, that instance will _not_ be restarted by Pacemaker on this node
> unless an administrator intervenes.
>
> Still, Michael, Yves has a point that evict_outdated_slaves is not
> optimal (and I'm saying this as the guy that wrote that part of the
> agent). It's fine for a temporary problem that affects a single slave,
> but please consider this scenario:
>
> - High load on the database, across several instances.
> - Slaves start lagging behind.
> - We shut down a slave that is too far behind.
> - We now have _fewer_ instances to handle the same load.
> - Slaves fall further behind.
> - We shut down more slaves.
>
> This can turn into a cascading failure. Note, specifically, that the
> lagging slave has no real option to catch up even when the database
> isn't being hammered anymore, unless an admin has intervened and
> recovered/restarted the instance manually. And, of course, Yves' point
> about cold caches is entirely valid.
>
> In Yves' approach, we wouldn't shut down MySQL, but merely shift away
> the slave's virtual IP. So while clients can't connect to the
> slave via
> its virtual IP anymore, the slave can still fetch updates from the
> master -- and thus, actually has a chance to catch up. Once it's
> sufficiently caught up, it gets the VIP back and clients can talk to
> that slave again. And since we never stopped MySQL, we also don't have
> the cold cache problem.
>
> Yves' patches are not perfect (and they're not expected to be, that's
> what a review is for), but I think his approach is sound and shouldn't
> be shot down simply because evict_outdated_slaves is already there.
>
> Cheers,
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> <mailto:Pacemaker at oss.clusterlabs.org>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20111112/b2785168/attachment-0001.html>
More information about the Pacemaker
mailing list