[ClusterLabs] cluster-recheck-interval and failure-timeout
Antony Stone
Antony.Stone at ha.open.source.it
Wed Mar 31 08:32:32 EDT 2021
Hi.
I'm trying to understand what looks to me like incorrect behaviour between
cluster-recheck-interval and failure-timeout, under pacemaker 2.0.1
I have three machines in a corosync (3.0.1 if it matters) cluster, managing 12
resources in a single group.
I'm following documentation from:
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/
Pacemaker_Explained/s-cluster-options.html
and
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/
Pacemaker_Explained/s-resource-options.html
I have set a cluster property:
cluster-recheck-interval=60s
I have set a resource property:
failure-timeout=180
The docs say failure-timeout is "How many seconds to wait before acting as if
the failure had not occurred, and potentially allowing the resource back to
the node on which it failed."
I think this should mean that if the resource fails and gets restarted, the
fact that it failed will be "forgotten" after 180 seconds (or maybe a little
longer, depending on exactly when the next cluster recheck is done).
However what I'm seeing is that if the resource fails and gets restarted, and
this then happens an hour later, it's still counted as two failures. If it
fails and gets restarted another hour after that, it's recorded as three
failures and (because I have "migration-threshold=3") it gets moved to another
node (and therefore all the other resources in group are moved as well).
So, what am I misunderstanding about "failure-timeout", and what configuration
setting do I need to use to tell pacemaker that "provided the resource hasn't
failed within the past X seconds, forget the fact that it failed more than X
seconds ago"?
Thanks,
Antony.
--
The first fifty percent of an engineering project takes ninety percent of the
time, and the remaining fifty percent takes another ninety percent of the time.
Please reply to the list;
please *don't* CC me.
More information about the Users
mailing list