[ClusterLabs] failed resource resurection - failcount/cleanup etc ?
peljasz at yahoo.co.uk
Thu Jul 11 05:39:44 EDT 2019
On 10/07/2019 15:50, Ken Gaillot wrote:
> On Wed, 2019-07-10 at 11:26 +0100, lejeczek wrote:
>> hi guys, possibly @devel if they pop in here.
>> is there, will there be, a way to make cluster deal with failed
>> resources in such a way that cluster would try not to give up on
>> I understand that as of now the only way is user's manual
>> (under which I'd include any scripted ways outside of the cluster) if
>> need to bring back up a failed resource.
>> many thanks, L.
> Not sure what you mean ... the default behavior is to try restarting a
> failed resource up to 1,000,000 times on the same node, then try
> starting it on a different node, and not give up until all nodes have
> failed to start it.
> This is affected by on-fail, migration-threshold, failure-timeout, and
> If you're talking about a resource that failed because the entire node
> failed, then fencing comes into play.
Apologies for I was not clear enough while wording my question, I see
that now. When I said - make cluster deal with failed resources - I
meant a resource which failed in the (whole) cluster, failed on every node.
If that happens I see that only my (user manual) intervention can make
cluster peep at the resource again and I wonder if this is me unaware
that there are ways it can be done, that cluster will not need me and by
itself would do something, will not give up.
My case is: a systemd resource which whether successful or not is
determined by a mechanism outside of the cluster, it can only
successfully start on one single node. When that node reboots then
cluster fails this resource, when that node rebooted and is up again the
failed resource remains in failed state.
Hopefully I manged to make it bit clearer this time.
Many thanks, L.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1757 bytes
Desc: not available
More information about the Users