[ClusterLabs] failure-timeout not working in corosync 2.0.1
Antony Stone
Antony.Stone at ha.open.source.it
Wed Mar 31 17:52:42 EDT 2021
On Wednesday 31 March 2021 at 23:09:38, Antony Stone wrote:
> On Wednesday 31 March 2021 at 22:53:53, Reid Wahl wrote:
> > Hi, Antony. failure-timeout should be a resource meta attribute, not an
> > attribute of the monitor operation. At least I'm not aware of it being
> > configurable per-operation -- maybe it is. Can't check at the moment :)
>
> Okay, I'll try moving it - but that still leaves me wondering why it works
> fine in pacemaker 1.1.16 and not in 2.0.1.
*Thank you, Reid*
It works.
Moving the failure-timeout specification to the "meta" section of the resource
definition has caused the failures to disappear from "crm status -f" after the
expected amount of time.
I am sure that this also means the resources are no longer going to move from
node 1 to node 2 to node 3 and then got totally stuck.
I shall find for sure out by tomorrow (it's nearly midnight where I am now).
I already know what I need to do to stop this particular resource from having
to be restarted so frequently, but the fact that the 2.0.1 cluster couldn't
cope with it at all made me nervous about just doing that, and then never
being confident that the cluster _could_ cope if a resource really needed to be
restarted several times.
Pacemaker 1.1.16 could cope with the configuration fine, even though I was
clearly putting failure-timeout into the wrong place in cluster.cib.
Once again, thank you Reid.
Antony.
--
What do you get when you cross a joke with a rhetorical question?
Please reply to the list;
please *don't* CC me.
More information about the Users
mailing list