[ClusterLabs] failed resource resurection - failcount/cleanup etc ?

Wed Jul 10 10:50:44 EDT 2019

On Wed, 2019-07-10 at 11:26 +0100, lejeczek wrote:
> hi guys, possibly @devel if they pop in here.
> 
> is there, will there be, a way to make cluster deal with failed
> resources in such a way that cluster would try not to give up on
> failed
> resources?
> 
> I understand that as of now the only way is  user's manual
> intervention
> (under which I'd include any scripted ways outside of the cluster) if
> we
> need to bring back up a failed resource.
> 
> many thanks, L.

Not sure what you mean ... the default behavior is to try restarting a
failed resource up to 1,000,000 times on the same node, then try
starting it on a different node, and not give up until all nodes have
failed to start it.

This is affected by on-fail, migration-threshold, failure-timeout, and
start-failure-is-fatal.

If you're talking about a resource that failed because the entire node
failed, then fencing comes into play.
-- 
Ken Gaillot <kgaillot at redhat.com>