[ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.
Ken Gaillot
kgaillot at redhat.com
Wed Sep 21 16:17:38 EDT 2016
On 09/20/2016 07:51 PM, Andrew Beekhof wrote:
>
>
> On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
>
> Hi everybody,
>
> Currently, Pacemaker's on-fail property allows you to configure how the
> cluster reacts to operation failures. The default "restart" means try to
> restart on the same node, optionally moving to another node once
> migration-threshold is reached. Other possibilities are "ignore",
> "block", "stop", "fence", and "standby".
>
> Occasionally, we get requests to have something like migration-threshold
> for values besides restart. For example, try restarting the resource on
> the same node 3 times, then fence.
>
> I'd like to get your feedback on two alternative approaches we're
> considering.
>
> ###
>
> Our first proposed approach would add a new hard-fail-threshold
> operation property. If specified, the cluster would first try restarting
> the resource on the same node,
>
>
> Well, just as now, it would be _allowed_ to start on the same node, but
> this is not guaranteed.
>
>
> before doing the on-fail handling.
>
> For example, you could configure a promote operation with
> hard-fail-threshold=3 and on-fail=fence, to fence the node after 3
> failures.
>
>
> One point that's not settled is whether failures of *any* operation
> would count toward the 3 failures (which is how migration-threshold
> works now), or only failures of the specified operation.
>
>
> I think if hard-fail-threshold is per-op, then only failures of that
> operation should count.
>
>
>
> Currently, if a start fails (but is retried successfully), then a
> promote fails (but is retried successfully), then a monitor fails, the
> resource will move to another node if migration-threshold=3. We could
> keep that behavior with hard-fail-threshold, or only count monitor
> failures toward monitor's hard-fail-threshold. Each alternative has
> advantages and disadvantages.
>
> ###
>
> The second proposed approach would add a new on-restart-fail resource
> property.
>
> Same as now, on-fail set to anything but restart would be done
> immediately after the first failure. A new value, "ban", would
> immediately move the resource to another node. (on-fail=ban would behave
> like on-fail=restart with migration-threshold=1.)
>
> When on-fail=restart, and restarting on the same node doesn't work, the
> cluster would do the on-restart-fail handling. on-restart-fail would
> allow the same values as on-fail (minus "restart"), and would default to
> "ban".
>
>
> I do wish you well tracking "is this a restart" across demote -> stop ->
> start -> promote in 4 different transitions :-)
>
>
>
> So, if you want to fence immediately after any promote failure, you
> would still configure on-fail=fence; if you want to try restarting a few
> times first, you would configure on-fail=restart and
> on-restart-fail=fence.
>
> This approach keeps the current threshold behavior -- failures of any
> operation count toward the threshold. We'd rename migration-threshold to
> something like hard-fail-threshold, since it would apply to more than
> just migration, but unlike the first approach, it would stay a resource
> property.
>
> ###
>
> Comparing the two approaches, the first is more flexible, but also more
> complex and potentially confusing.
>
>
> More complex to implement or more complex to configure?
I was thinking more complex in behavior, so perhaps harder to follow /
expect.
For example, "After two start failures, fence this node; after three
promote failures, put the node in standby; but if a monitor failure is
the third operation failure of any type, then move the resource to
another node."
Granted, someone would have to inflict that on themselves :) but another
sysadmin / support tech / etc. who had to deal with the config later
might have trouble following it.
To keep the current default behavior, the default would be complicated,
too: "1 for start and stop operations, and 0 for other operations" where
"0 is equivalent to 1 except when on-fail=restart, in which case
migration-threshold will be used instead".
And then add to that tracking fail-count per node+resource+operation
combination, with the associated status output and cleanup options.
"crm_mon -f" currently shows failures like:
* Node node1:
rsc1: migration-threshold=3 fail-count=1 last-failure='Wed Sep 21
15:12:59 2016'
What should that look like with per-op thresholds and fail-counts?
I'm not saying it's a bad idea, just that it's more complicated than it
first sounds, so it's worth thinking through the implications.
> With either approach, we would deprecate the start-failure-is-fatal
> cluster property. start-failure-is-fatal=true would be equivalent to
> hard-fail-threshold=1 with the first approach, and on-fail=ban with the
> second approach. This would be both simpler and more useful -- it allows
> the value to be set differently per resource.
> --
> Ken Gaillot <kgaillot at redhat.com <mailto:kgaillot at redhat.com>>
More information about the Users
mailing list