An N+1 or N+X topology might be good for that cascading scenario...  Find a sweet spot for the evict date.  If slave are lagging too much,  scale and tune.  <br><br>I haven't read Yves' patch, but I'll check it out.  I just saw that he was looking for slave to work with VIP and suggested a couple ways I've seen it work. <br>

<br><div class="gmail_quote">On Sat, Nov 12, 2011 at 2:51 PM, Florian Haas <span dir="ltr"><<a href="mailto:florian@hastexo.com">florian@hastexo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi Yves and Michael,<br>

<div class="im"><br>

On 2011-11-12 19:22, Yves Trudeau wrote:<br>

> lol... How many large databases have you managed?  Once evicted, MySQL<br>

> will be restarted by Pacemaker so all the caches will be cold.<br>

<br>

</div>If I may say so, before you start laughing at people on the list, it may<br>

be a good idea to actually get your facts straight and check what<br>

evict_outdated_slaves does. For a too-far-behind slave it bails out of<br>

monitor with $OCF_ERR_INSTALLED, which Pacemaker considers a hard error.<br>

Thus, that instance will _not_ be restarted by Pacemaker on this node<br>

unless an administrator intervenes.<br>

<br>

Still, Michael, Yves has a point that evict_outdated_slaves is not<br>

optimal (and I'm saying this as the guy that wrote that part of the<br>

agent). It's fine for a temporary problem that affects a single slave,<br>

but please consider this scenario:<br>

<br>

- High load on the database, across several instances.<br>

- Slaves start lagging behind.<br>

- We shut down a slave that is too far behind.<br>

- We now have _fewer_ instances to handle the same load.<br>

- Slaves fall further behind.<br>

- We shut down more slaves.<br>

<br>

This can turn into a cascading failure. Note, specifically, that the<br>

lagging slave has no real option to catch up even when the database<br>

isn't being hammered anymore, unless an admin has intervened and<br>

recovered/restarted the instance manually. And, of course, Yves' point<br>

about cold caches is entirely valid.<br>

<br>

In Yves' approach, we wouldn't shut down MySQL, but merely shift away<br>

the slave's virtual IP. So while clients can't connect to the slave via<br>

its virtual IP anymore, the slave can still fetch updates from the<br>

master -- and thus, actually has a chance to catch up. Once it's<br>

sufficiently caught up, it gets the VIP back and clients can talk to<br>

that slave again. And since we never stopped MySQL, we also don't have<br>

the cold cache problem.<br>

<br>

Yves' patches are not perfect (and they're not expected to be, that's<br>

what a review is for), but I think his approach is sound and shouldn't<br>

be shot down simply because evict_outdated_slaves is already there.<br>

<br>

Cheers,<br>

Florian<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Need help with High Availability?<br>

<a href="http://www.hastexo.com/now" target="_blank">http://www.hastexo.com/now</a><br>

</font></span><div class="HOEnZb"><div class="h5"><br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

</div></div></blockquote></div><br>