[Pacemaker] Fail-count and failure timeout

Andrew Beekhof andrew at beekhof.net
Tue Oct 5 05:08:48 EDT 2010


On Tue, Oct 5, 2010 at 11:07 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Fri, Oct 1, 2010 at 3:40 PM,  <Holger.Teutsch at fresenius-netcare.com> wrote:
>> Hi,
>> I observed the following in pacemaker Versions 1.1.3 and tip up to patch
>> 10258.
>>
>> In a small test environment to study fail-count behavior I have one resource
>>
>> anything
>> doing sleep 600 with monitoring interval 10 secs.
>>
>> The failure-timeout is 300.
>>
>> I would expect to never see a failcount higher than 1.
>
> Why?
>
> The fail-count is only reset when the PE runs... which is on a failure
> and/or after the cluster-recheck-interval
> So I'd expect a maximum of two.

Actually this is wrong.
There is no maximum, because there needs to have been 300s since the
last failure when the PE runs.
And since it only runs when the resource fails, it is never reset.

>
>       cluster-recheck-interval = time [15min]
>              Polling interval for time based changes to options,
> resource parameters and constraints.
>
>              The Cluster is primarily event driven, however the
> configuration can have elements that change based on time. To ensure
> these changes take effect, we can optionally poll  the  cluster’s
>              status for changes. Allowed values: Zero disables
> polling. Positive values are an interval in seconds (unless other SI
> units are specified. eg. 5min)
>
>
>
>>
>> I observed some sporadic clears but mostly the count is increasing by 1 each
>> 10 minutes.
>>
>> Am I mistaken or is this a bug ?
>
> Hard to say without logs.  What value did it reach?
>
>>
>> Regards
>> Holger
>>
>> -- complete cib for reference ---
>>
>> <cib epoch="32" num_updates="0" admin_epoch="0"
>> validate-with="pacemaker-1.2" crm_feature_set="3.0.4" have-quorum="0"
>> cib-last-written="Fri Oct  1 14:17:31 2010" dc-uuid="hotlx">
>>   <configuration>
>>     <crm_config>
>>       <cluster_property_set id="cib-bootstrap-options">
>>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>> value="1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67"/>
>>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>> name="cluster-infrastructure" value="openais"/>
>>         <nvpair id="cib-bootstrap-options-expected-quorum-votes"
>> name="expected-quorum-votes" value="2"/>
>>         <nvpair id="cib-bootstrap-options-no-quorum-policy"
>> name="no-quorum-policy" value="ignore"/>
>>         <nvpair id="cib-bootstrap-options-stonith-enabled"
>> name="stonith-enabled" value="false"/>
>>         <nvpair id="cib-bootstrap-options-start-failure-is-fatal"
>> name="start-failure-is-fatal" value="false"/>
>>         <nvpair id="cib-bootstrap-options-last-lrm-refresh"
>> name="last-lrm-refresh" value="1285926879"/>
>>       </cluster_property_set>
>>     </crm_config>
>>     <nodes>
>>       <node id="hotlx" uname="hotlx" type="normal"/>
>>     </nodes>
>>     <resources>
>>       <primitive class="ocf" id="test" provider="heartbeat" type="anything">
>>         <meta_attributes id="test-meta_attributes">
>>           <nvpair id="test-meta_attributes-target-role" name="target-role"
>> value="started"/>
>>           <nvpair id="test-meta_attributes-failure-timeout"
>> name="failure-timeout" value="300"/>
>>         </meta_attributes>
>>         <operations id="test-operations">
>>           <op id="test-op-monitor-10" interval="10" name="monitor"
>> on-fail="restart" timeout="20s"/>
>>           <op id="test-op-start-0" interval="0" name="start"
>> on-fail="restart" timeout="20s"/>
>>         </operations>
>>         <instance_attributes id="test-instance_attributes">
>>           <nvpair id="test-instance_attributes-binfile" name="binfile"
>> value="sleep 600"/>
>>         </instance_attributes>
>>       </primitive>
>>     </resources>
>>     <constraints/>
>>   </configuration>
>> </cib>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>




More information about the Pacemaker mailing list