[ClusterLabs] Pacemaker's "stonith too many failures" log is not accurate

Klaus Wenninger kwenning at redhat.com
Wed May 17 11:56:41 CEST 2017


On 05/17/2017 11:28 AM, 井上 和徳 wrote:
> Hi,
> I'm testing Pacemaker-1.1.17-rc1.
> The number of failures in "Too many failures (10) to fence" log does not match the number of actual failures.

Well it kind of does as after 10 failures it doesn't try fencing again
so that is what
failures stay at ;-)
Of course it still sees the need to fence but doesn't actually try.

Regards,
Klaus

>
> After the 11th time fence failure, "Too many failures (10) to fence" is output.
> Incidentally, stonith-max-attempts has not been set, so it is 10 by default..
>
> [root at x3650f log]# egrep "Requesting fencing|error: Operation reboot|Stonith failed|Too many failures"
> ##Requesting fencing : 1st time
> May 12 05:51:47 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:52:52 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.8415167d: No data available
> May 12 05:52:52 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 2nd time
> May 12 05:52:52 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:53:56 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.53d3592a: No data available
> May 12 05:53:56 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 3rd time
> May 12 05:53:56 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:55:01 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.9177cb76: No data available
> May 12 05:55:01 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 4th time
> May 12 05:55:01 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:56:05 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.946531cb: No data available
> May 12 05:56:05 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 5th time
> May 12 05:56:05 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:57:10 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.278b3c4b: No data available
> May 12 05:57:10 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 6th time
> May 12 05:57:10 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:58:14 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.7a49aebb: No data available
> May 12 05:58:14 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 7th time
> May 12 05:58:14 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 05:59:19 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.83421862: No data available
> May 12 05:59:19 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 8th time
> May 12 05:59:19 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 06:00:24 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.afd7ef98: No data available
> May 12 06:00:24 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 9th time
> May 12 06:00:24 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 06:01:28 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.3b033dbe: No data available
> May 12 06:01:28 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 10th time
> May 12 06:01:28 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 06:02:33 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.5447a345: No data available
> May 12 06:02:33 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
> ## 11th time
> May 12 06:02:33 rhel73-1 crmd[5269]:  notice: Requesting fencing (reboot) of node rhel73-2
> May 12 06:03:37 rhel73-1 stonith-ng[5265]:   error: Operation reboot of rhel73-2 by rhel73-1 for crmd.5269 at rhel73-1.db50c21a: No data available
> May 12 06:03:37 rhel73-1 crmd[5269]: warning: Too many failures (10) to fence rhel73-2, giving up
> May 12 06:03:37 rhel73-1 crmd[5269]:  notice: Transition aborted: Stonith failed
>
> Regards,
> Kazunori INOUE
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Users mailing list