[ClusterLabs] Pacemaker's "stonith too many failures" log is not accurate

井上 和徳 inouekazu at intellilink.co.jp
Thu May 18 12:37:41 CEST 2017


Hi Ken,
thank you for your comment.
I'll try to check behavior.

> -----Original Message-----
> From: Ken Gaillot [mailto:kgaillot at redhat.com]
> Sent: Wednesday, May 17, 2017 11:09 PM
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker's "stonith too many failures" log is not accurate
> 
> On 05/17/2017 04:56 AM, Klaus Wenninger wrote:
> > On 05/17/2017 11:28 AM, 井上 和徳 wrote:
> >> Hi,
> >> I'm testing Pacemaker-1.1.17-rc1.
> >> The number of failures in "Too many failures (10) to fence" log does not match the number of actual failures.
> >
> > Well it kind of does as after 10 failures it doesn't try fencing again
> > so that is what
> > failures stay at ;-)
> > Of course it still sees the need to fence but doesn't actually try.
> >
> > Regards,
> > Klaus
> 
> This feature can be a little confusing: it doesn't prevent all further
> fence attempts of the target, just *immediate* fence attempts. Whenever
> the next transition is started for some other reason (a configuration or
> state change, cluster-recheck-interval, node failure, etc.), it will try
> to fence again.
> 
> Also, it only checks this threshold if it's aborting a transition
> *because* of this fence failure. If it's aborting the transition for
> some other reason, the number can go higher than the threshold. That's
> what I'm guessing happened here.
> 
> >> After the 11th time fence failure, "Too many failures (10) to fence" is output.
> >> Incidentally, stonith-max-attempts has not been set, so it is 10 by default..
> >>
> >> [root at x3650f log]# egrep "Requesting fencing|error: Operation reboot|Stonith failed|Too many failures"
> >> ##Requesting fencing : 1st time
> >> May 12 05:51:47 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:52:52 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.8415167d: No data available
> >> May 12 05:52:52 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 2nd time
> >> May 12 05:52:52 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:53:56 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.53d3592a: No data available
> >> May 12 05:53:56 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 3rd time
> >> May 12 05:53:56 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:55:01 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.9177cb76: No data available
> >> May 12 05:55:01 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 4th time
> >> May 12 05:55:01 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:56:05 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.946531cb: No data available
> >> May 12 05:56:05 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 5th time
> >> May 12 05:56:05 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:57:10 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.278b3c4b: No data available
> >> May 12 05:57:10 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 6th time
> >> May 12 05:57:10 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:58:14 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.7a49aebb:
> No data available
> >> May 12 05:58:14 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 7th time
> >> May 12 05:58:14 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 05:59:19 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.83421862: No data available
> >> May 12 05:59:19 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 8th time
> >> May 12 05:59:19 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 06:00:24 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.afd7ef98:
> No data available
> >> May 12 06:00:24 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 9th time
> >> May 12 06:00:24 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 06:01:28 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.3b033dbe: No data available
> >> May 12 06:01:28 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 10th time
> >> May 12 06:01:28 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 06:02:33 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for
> crmd.5269 at rhel73-1.5447a345: No data available
> >> May 12 06:02:33 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >> ## 11th time
> >> May 12 06:02:33 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> >> May 12 06:03:37 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.db50c21a:
> No data available
> >> May 12 06:03:37 rhel73-1 crmd[5269]: warning: Too many failures (10) to fence rhel73-2, giving up
> >> May 12 06:03:37 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> >>
> >> Regards,
> >> Kazunori INOUE
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


More information about the Users mailing list