[ClusterLabs] How to stop resource after N failures?

Fri Oct 27 14:52:37 EDT 2017

Hello,

Is there any way to configure a pacemaker cluster so that a resource will
restart N times and then stop?

This is essentially a mix of migration-threshold and on-failure=stop. Let
me describe my setup and what I've tried.

My cluster is a set of independent apps sharing a VIP (all apps are
co-located with the VIP). If an app fails, I want to try a restart since
the issue might have been ephemeral. For safety, I don't want to keep
restarting indefinitely; three attempts is a reasonable max. I also don't
want one app to cause everything (VIP and all apps) to move. This would
interrupt all established connections. It makes more sense the leave the
one app down and let the working apps stay put.

If I use on-failure=stop, Pacemaker won't attempt a restart. If I use
migration-threshold, one app failing causes all apps and the VIP to move.
I've read through all the documentation, man pages, and message boards I
can find, but I don't see a solution. Any ideas? Is this setup possible in
Pacemaker or do I need to pick between on-failure=stop and
migration-threshold=3?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171027/6effc2ec/attachment-0002.html>