[ClusterLabs] failed resource resurection - failcount/cleanup etc ?
Ken Gaillot
kgaillot at redhat.com
Wed Jul 10 10:50:44 EDT 2019
On Wed, 2019-07-10 at 11:26 +0100, lejeczek wrote:
> hi guys, possibly @devel if they pop in here.
>
> is there, will there be, a way to make cluster deal with failed
> resources in such a way that cluster would try not to give up on
> failed
> resources?
>
> I understand that as of now the only way is user's manual
> intervention
> (under which I'd include any scripted ways outside of the cluster) if
> we
> need to bring back up a failed resource.
>
> many thanks, L.
Not sure what you mean ... the default behavior is to try restarting a
failed resource up to 1,000,000 times on the same node, then try
starting it on a different node, and not give up until all nodes have
failed to start it.
This is affected by on-fail, migration-threshold, failure-timeout, and
start-failure-is-fatal.
If you're talking about a resource that failed because the entire node
failed, then fencing comes into play.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list