[ClusterLabs] Antw: Re: Antw: [EXT] Re: cluster-recheck-interval and failure-timeout

Wed Apr 7 04:40:54 EDT 2021

>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 06.04.2021 um 15:58 in
Nachricht
<dd0e25837f82746a4363c216685f3ec5a01ca8a0.camel at redhat.com>:
> On Tue, 2021-04-06 at 09:15 +0200, Ulrich Windl wrote:
>> > > > Ken Gaillot <kgaillot at redhat.com> schrieb am 31.03.2021 um
>> > > > 15:48 in
>> 
>> Nachricht
>> <7dfc7c46442db17d9645854081f1269261518f84.camel at redhat.com>:
>> > On Wed, 2021‑03‑31 at 14:32 +0200, Antony Stone wrote:
>> > > Hi.
>> > > 
>> > > I'm trying to understand what looks to me like incorrect
>> > > behaviour
>> > > between 
>> > > cluster‑recheck‑interval and failure‑timeout, under pacemaker
>> > > 2.0.1
>> > > 
>> > > I have three machines in a corosync (3.0.1 if it matters)
>> > > cluster,
>> > > managing 12 
>> > > resources in a single group.
>> > > 
>> > > I'm following documentation from:
>> > > 
>> > > https://clusterlabs.org/pacemaker/doc/en‑US/Pacemaker/2.0/html/ 
>> > > Pacemaker_Explained/s‑cluster‑options.html
>> > > 
>> > > and
>> > > 
>> > > https://clusterlabs.org/pacemaker/doc/en‑US/Pacemaker/2.0/html/ 
>> > > Pacemaker_Explained/s‑resource‑options.html
>> > > 
>> > > I have set a cluster property:
>> > > 
>> > > 	cluster‑recheck‑interval=60s
>> > > 
>> > > I have set a resource property:
>> > > 
>> > > 	failure‑timeout=180
>> > > 
>> > > The docs say failure‑timeout is "How many seconds to wait before
>> > > acting as if 
>> > > the failure had not occurred, and potentially allowing the
>> > > resource
>> > > back to 
>> > > the node on which it failed."
>> > > 
>> > > I think this should mean that if the resource fails and gets
>> > > restarted, the 
>> > > fact that it failed will be "forgotten" after 180 seconds (or
>> > > maybe a
>> > > little 
>> > > longer, depending on exactly when the next cluster recheck is
>> > > done).
>> > > 
>> > > However what I'm seeing is that if the resource fails and gets
>> > > restarted, and 
>> > > this then happens an hour later, it's still counted as two
>> > > failures.  If it 
>> > 
>> > That is exactly correct.
>> > 
>> > > fails and gets restarted another hour after that, it's recorded
>> > > as
>> > > three 
>> > > failures and (because I have "migration‑threshold=3") it gets
>> > > moved
>> > > to another 
>> > > node (and therefore all the other resources in group are moved as
>> > > well).
>> > > 
>> > > So, what am I misunderstanding about "failure‑timeout", and what
>> > > configuration 
>> > > setting do I need to use to tell pacemaker that "provided the
>> > > resource hasn't 
>> > > failed within the past X seconds, forget the fact that it failed
>> > > more
>> > > than X 
>> > > seconds ago"?
>> > 
>> > Unfortunately, there is no way. failure‑timeout expires *all*
>> > failures
>> > once the *most recent* is that old. It's a bit counter‑intuitive
>> > but
>> > currently, Pacemaker only remembers a resource's most recent
>> > failure
>> > and the total count of failures, and changing that would be a big
>> > project.
>> 
>> Hi!
>> 
>> Sorry I don't get it: If you have a timestamp for each failure-
>> timeout, what's
>> so hard to put all the fail counts that are older than failure-
>> timeout on a
>> list, and then reset that list to zero?
> 
> That's exactly the issue -- we don't have a timestamp for each failure.
> Only the most recent failed operation, and the total fail count (per
> resource and operation), are stored in the CIB status.
> 
> We could store all failures in the CIB, but that would be a significant
> project, and we'd need new options to keep the current behavior as the
> default.

Hi!

I still don't quite get it: Some failing operation increases the fail-count,
and the time stamp for the failing operation is recorded (crm_mon can display
it). So solving this problem (saving the last time for each fail count) doesn't
look so hard to do.

Regards,
Ulrich

> 
>> I mean: That would be what everyone expects.
>> What is implemented instead is like FIFO scheduling: As long as there
>> is a new
>> entry at the head of the queue, the jobs at the tail will never be
>> executed.
>> 
>> Regards,
>> Ulrich
>> 
>> > 
>> > 
>> > > Thanks,
>> > > 
>> > > 
>> > > Antony.
>> > > 
>> > 
>> > ‑‑ 
>> > Ken Gaillot <kgaillot at redhat.com>
>> > 
>> > _______________________________________________
>> > Manage your subscription:
>> > https://lists.clusterlabs.org/mailman/listinfo/users 
>> > 
>> > ClusterLabs home: https://www.clusterlabs.org/ 
>> 
>> 
>> 
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
> -- 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/