An N+1 or N+X topology might be good for that cascading scenario... Find a sweet spot for the evict date. If slave are lagging too much, scale and tune. <br><br>I haven't read Yves' patch, but I'll check it out. I just saw that he was looking for slave to work with VIP and suggested a couple ways I've seen it work. <br>
<br><div class="gmail_quote">On Sat, Nov 12, 2011 at 2:51 PM, Florian Haas <span dir="ltr"><<a href="mailto:florian@hastexo.com">florian@hastexo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Yves and Michael,<br>
<div class="im"><br>
On 2011-11-12 19:22, Yves Trudeau wrote:<br>
> lol... How many large databases have you managed? Once evicted, MySQL<br>
> will be restarted by Pacemaker so all the caches will be cold.<br>
<br>
</div>If I may say so, before you start laughing at people on the list, it may<br>
be a good idea to actually get your facts straight and check what<br>
evict_outdated_slaves does. For a too-far-behind slave it bails out of<br>
monitor with $OCF_ERR_INSTALLED, which Pacemaker considers a hard error.<br>
Thus, that instance will _not_ be restarted by Pacemaker on this node<br>
unless an administrator intervenes.<br>
<br>
Still, Michael, Yves has a point that evict_outdated_slaves is not<br>
optimal (and I'm saying this as the guy that wrote that part of the<br>
agent). It's fine for a temporary problem that affects a single slave,<br>
but please consider this scenario:<br>
<br>
- High load on the database, across several instances.<br>
- Slaves start lagging behind.<br>
- We shut down a slave that is too far behind.<br>
- We now have _fewer_ instances to handle the same load.<br>
- Slaves fall further behind.<br>
- We shut down more slaves.<br>
<br>
This can turn into a cascading failure. Note, specifically, that the<br>
lagging slave has no real option to catch up even when the database<br>
isn't being hammered anymore, unless an admin has intervened and<br>
recovered/restarted the instance manually. And, of course, Yves' point<br>
about cold caches is entirely valid.<br>
<br>
In Yves' approach, we wouldn't shut down MySQL, but merely shift away<br>
the slave's virtual IP. So while clients can't connect to the slave via<br>
its virtual IP anymore, the slave can still fetch updates from the<br>
master -- and thus, actually has a chance to catch up. Once it's<br>
sufficiently caught up, it gets the VIP back and clients can talk to<br>
that slave again. And since we never stopped MySQL, we also don't have<br>
the cold cache problem.<br>
<br>
Yves' patches are not perfect (and they're not expected to be, that's<br>
what a review is for), but I think his approach is sound and shouldn't<br>
be shot down simply because evict_outdated_slaves is already there.<br>
<br>
Cheers,<br>
Florian<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Need help with High Availability?<br>
<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>
</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>
</div></div></blockquote></div><br>