[ClusterLabs] Antw: Retries before setting fail-count to INFINITY

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 21 09:39:16 EDT 2017


>>> Vaibhaw Pandey <vabu.vayu at gmail.com> schrieb am 21.08.2017 um 14:58 in
Nachricht
<CAAdwLTsZMX5fD=RsA7k1DKgMKoZ51A0jM=Hay4rUB4EF44Z7PA at mail.gmail.com>:
> Version in use: 1.1 along with corosync 1.4
> 
> Hello,
> I am new to pacemaker and was trying to setup a MySQL master/slave cluster
> using pacemaker and had a question on resource failure response which I
> couldn't resolve from the documentation.
> 
> The pacemaker doc (
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_fa 
> ilure_response.html)
> says clearly that:
> 
> "Normally, if a running resource fails, pacemaker will try to stop it and
> start it again."
> 
> I was wondering if there is a way to configure the # of times pacemaker
> will attempt this start and stop sequence - we want to try and restart the
> resource 2 or 3 times before it is stopped. Obviously setting a

Maybe you misunderstood: A stopped resource is the precondition for a successful start. So before any start attempt of a failed resource comes a stop attempt. If your monitor times out, try to increase the monitor timeout; it it causes false alerts, fix the monitor. If the database is crap, replace the database ;-)

> migration-threshold doesn't work in this case because the moment the 1st
> attempt to restart the resource fails, fail-count is set to INFINITY. Our
> failure-timeout is set to default (0).

Yes, the cluster cannot predict the future: If the resource failed to start, it's unlikely that repeating the same thing will suddenly succeed. It's more likely that the start will suceed elesewhere (disregarding configuration errors).

> 
> The reason we wish to do this is that, at times the database is busy and
> the monitor action fails. However there is a good chance it might succeed
> on a second or third attempt.

"it" is "monitor" operation?

> 
> Is there a parameter in pacemaker that we can utilize to cause this
> behavior or will this have to be coded in the resource agent?

See above.

> 
> Thanks,
> Vaibhaw








More information about the Users mailing list