[ClusterLabs] Fwd: Resource failure

Mon Jul 27 05:52:10 UTC 2015

On 27/07/15 01:35 AM, Vijay Partha wrote:
> HI .
> 
> My configuration file looks like this:
> 
> <cib crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="38"
> num_updates="0" admin_epoch="0" cib-last-written="Fri Jul 24 15:57:06
> 2015" have-quorum="1" dc-uuid="node2">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.1.11-97629de"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="cman"/>
>         <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="false"/>
>         <nvpair id="cib-bootstrap-options-no-quorum-policy"
> name="no-quorum-policy" value="ignore"/>
>         <nvpair id="cib-bootstrap-options-cluster-recheck-interval"
> name="cluster-recheck-interval" value="2s"/>
>       </cluster_property_set>
>     </crm_config>
>     <nodes>
>       <node id="node1" uname="node1"/>
>       <node id="node2" uname="node2"/>
>     </nodes>
>     <resources>
>       <primitive class="ocf" id="my_first_svc" provider="heartbeat"
> type="Dummy">
>         <instance_attributes id="my_first_svc-instance_attributes"/>
>         <operations>
>           <op id="my_first_svc-start-timeout-20" interval="0s"
> name="start" timeout="20"/>
>           <op id="my_first_svc-stop-timeout-20" interval="0s"
> name="stop" timeout="20"/>
>           <op id="my_first_svc-monitor-interval-120s" interval="120s"
> name="monitor"/>
>         </operations>
>       </primitive>
>       <primitive class="ocf" id="WebSite" provider="heartbeat"
> type="apache">
>         <instance_attributes id="WebSite-instance_attributes">
>           <nvpair id="WebSite-instance_attributes-configfile"
> name="configfile" value="/etc/httpd/conf/httpd.conf"/>
>           <nvpair id="WebSite-instance_attributes-statusurl"
> name="statusurl" value="http://localhost/server-status"/>
>         </instance_attributes>
>         <operations>
>           <op id="WebSite-start-timeout-40s" interval="0s" name="start"
> timeout="40s" on-fail="restart"/>
>           <op id="WebSite-stop-timeout-60s" interval="0s" name="stop"
> timeout="60s" on-fail="restart"/>
>           <op id="WebSite-monitor-interval-1min" interval="1min"
> name="monitor" on-fail="restart"/>
>         </operations>
>         <meta_attributes id="WebSite-meta_attributes"/>
>       </primitive>
>     </resources>
>     <constraints>
>       <rsc_location id="location-WebSite-node2-50" node="node2"
> rsc="WebSite" score="50"/>
>   </constraints>
>     <rsc_defaults>
>       <meta_attributes id="rsc_defaults-options">
>         <nvpair id="rsc_defaults-options-migration-threshold"
> name="migration-threshold" value="1"/>
>       </meta_attributes>
>     </rsc_defaults>
>     <op_defaults>
>       <meta_attributes id="op_defaults-options">
>         <nvpair id="op_defaults-options-timeout" name="timeout"
> value="240s"/>
>       </meta_attributes>
>     </op_defaults>
>   </configuration>
> </cib>
> 
> Once i stop the httpd service the pacemaker does not restarts it
> automatically.

As mentioned, logs help a lot. The logs from all nodes starting before
you trigger the failure until after the logs stop printing please.

Also, you must use stonith. Please configure and test it. Often problems
go away when stonith is configured and working properly.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?