[Pacemaker] Adding VIP support for the MySQL RA

Florian Haas florian at hastexo.com
Sat Nov 12 14:51:28 EST 2011

Hi Yves and Michael,

On 2011-11-12 19:22, Yves Trudeau wrote:
> lol... How many large databases have you managed?  Once evicted, MySQL
> will be restarted by Pacemaker so all the caches will be cold.

If I may say so, before you start laughing at people on the list, it may
be a good idea to actually get your facts straight and check what
evict_outdated_slaves does. For a too-far-behind slave it bails out of
monitor with $OCF_ERR_INSTALLED, which Pacemaker considers a hard error.
Thus, that instance will _not_ be restarted by Pacemaker on this node
unless an administrator intervenes.

Still, Michael, Yves has a point that evict_outdated_slaves is not
optimal (and I'm saying this as the guy that wrote that part of the
agent). It's fine for a temporary problem that affects a single slave,
but please consider this scenario:

- High load on the database, across several instances.
- Slaves start lagging behind.
- We shut down a slave that is too far behind.
- We now have _fewer_ instances to handle the same load.
- Slaves fall further behind.
- We shut down more slaves.

This can turn into a cascading failure. Note, specifically, that the
lagging slave has no real option to catch up even when the database
isn't being hammered anymore, unless an admin has intervened and
recovered/restarted the instance manually. And, of course, Yves' point
about cold caches is entirely valid.

In Yves' approach, we wouldn't shut down MySQL, but merely shift away
the slave's virtual IP. So while clients can't connect to the slave via
its virtual IP anymore, the slave can still fetch updates from the
master -- and thus, actually has a chance to catch up. Once it's
sufficiently caught up, it gets the VIP back and clients can talk to
that slave again. And since we never stopped MySQL, we also don't have
the cold cache problem.

Yves' patches are not perfect (and they're not expected to be, that's
what a review is for), but I think his approach is sound and shouldn't
be shot down simply because evict_outdated_slaves is already there.


Need help with High Availability?

More information about the Pacemaker mailing list