[ClusterLabs] How does failure-timeout works, will the resource not be scheduled when setting too short?
Ken Gaillot
kgaillot at redhat.com
Sun May 27 16:38:43 EDT 2018
On Sun, 2018-05-20 at 10:19 +0800, lkxjtu wrote:
>
> I have two pacemaker resources. We call them A and B. Because of
> environmental reasons, their start methods and monitor methods always
> return failure
>
> (OCF_ERR_GENERIC). The following are their configurations:(The
> cluster property of start-failure-is-fatal is false)
>
> primitive A A \
> op monitor interval=20 timeout=120 \
> op stop interval=0 timeout=120 on-fail=restart \
> op start interval=0 timeout=240 on-fail=restart \
> meta failure-timeout=60s
> primitive B B \
> op monitor interval=20 timeout=120 \
> op stop interval=0 timeout=120 on-fail=restart \
> op start interval=0 timeout=240 on-fail=restart \
> meta failure-timeout=60s
> clone A_cl A
> clone B_cl B
>
> The time consuming of their methods is different:
> A:
> start = 60s monitor < 1s stop = 80s
> B:
> start < 1s monitor < 1s stop < 1s
>
> Resource of A is scheduled normally, always start and stop. But for
> resource B, there is only circular monitor fails, without start and
> stop.
> . And there is no fail-count showing of B in "crm status -f".
>
> Two operations can solve the problem of B not being scheduled:
> 1,Set failure-timeout of B from 60s to 600s
> 2,Modify ocf of A,make the stop method return as soon as possible
>
> I tested it several times, and the results were the same. Why does
> the resource not be scheduled when failure-timeout setting too short?
> And what does
>
> it have to do with the time consuming stop of another resource? Is
> this a bug?
>
> My pacemaker version is 1.1.16. Any suggestion is welcome. Thank you!
>
>
> James
> 2018-05-20
That behavior is unexpected. Can you share logs?
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list