[ClusterLabs] Fwd: Resource failure

Mon Jul 27 07:21:42 UTC 2015

Stonith, in short, puts a node that has entered an unknown state (or a
service is it running enters a unknown state) into a known state,
usually by force-rebooting the node. How this is actually done depends
on what hardware (or hypervisor) your nodes are using.

So for me to offer any additional advice, you need to tell us what you
nodes are built on.

On 27/07/15 02:00 AM, Vijay Partha wrote:
> Could you help me out in configuring stonith properly. I am new to
> pacemaker and I have been working for a few days. What all logs do you
> require?
> 
> On Mon, Jul 27, 2015 at 11:22 AM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
> 
>     On 27/07/15 01:35 AM, Vijay Partha wrote:
>     > HI .
>     >
>     > My configuration file looks like this:
>     >
>     > <cib crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="38"
>     > num_updates="0" admin_epoch="0" cib-last-written="Fri Jul 24 15:57:06
>     > 2015" have-quorum="1" dc-uuid="node2">
>     >   <configuration>
>     >     <crm_config>
>     >       <cluster_property_set id="cib-bootstrap-options">
>     >         <nvpair id="cib-bootstrap-options-dc-version"
>     name="dc-version"
>     > value="1.1.11-97629de"/>
>     >         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>     > name="cluster-infrastructure" value="cman"/>
>     >         <nvpair id="cib-bootstrap-options-stonith-enabled"
>     > name="stonith-enabled" value="false"/>
>     >         <nvpair id="cib-bootstrap-options-no-quorum-policy"
>     > name="no-quorum-policy" value="ignore"/>
>     >         <nvpair id="cib-bootstrap-options-cluster-recheck-interval"
>     > name="cluster-recheck-interval" value="2s"/>
>     >       </cluster_property_set>
>     >     </crm_config>
>     >     <nodes>
>     >       <node id="node1" uname="node1"/>
>     >       <node id="node2" uname="node2"/>
>     >     </nodes>
>     >     <resources>
>     >       <primitive class="ocf" id="my_first_svc" provider="heartbeat"
>     > type="Dummy">
>     >         <instance_attributes id="my_first_svc-instance_attributes"/>
>     >         <operations>
>     >           <op id="my_first_svc-start-timeout-20" interval="0s"
>     > name="start" timeout="20"/>
>     >           <op id="my_first_svc-stop-timeout-20" interval="0s"
>     > name="stop" timeout="20"/>
>     >           <op id="my_first_svc-monitor-interval-120s" interval="120s"
>     > name="monitor"/>
>     >         </operations>
>     >       </primitive>
>     >       <primitive class="ocf" id="WebSite" provider="heartbeat"
>     > type="apache">
>     >         <instance_attributes id="WebSite-instance_attributes">
>     >           <nvpair id="WebSite-instance_attributes-configfile"
>     > name="configfile" value="/etc/httpd/conf/httpd.conf"/>
>     >           <nvpair id="WebSite-instance_attributes-statusurl"
>     > name="statusurl" value="http://localhost/server-status"/>
>     >         </instance_attributes>
>     >         <operations>
>     >           <op id="WebSite-start-timeout-40s" interval="0s"
>     name="start"
>     > timeout="40s" on-fail="restart"/>
>     >           <op id="WebSite-stop-timeout-60s" interval="0s" name="stop"
>     > timeout="60s" on-fail="restart"/>
>     >           <op id="WebSite-monitor-interval-1min" interval="1min"
>     > name="monitor" on-fail="restart"/>
>     >         </operations>
>     >         <meta_attributes id="WebSite-meta_attributes"/>
>     >       </primitive>
>     >     </resources>
>     >     <constraints>
>     >       <rsc_location id="location-WebSite-node2-50" node="node2"
>     > rsc="WebSite" score="50"/>
>     >   </constraints>
>     >     <rsc_defaults>
>     >       <meta_attributes id="rsc_defaults-options">
>     >         <nvpair id="rsc_defaults-options-migration-threshold"
>     > name="migration-threshold" value="1"/>
>     >       </meta_attributes>
>     >     </rsc_defaults>
>     >     <op_defaults>
>     >       <meta_attributes id="op_defaults-options">
>     >         <nvpair id="op_defaults-options-timeout" name="timeout"
>     > value="240s"/>
>     >       </meta_attributes>
>     >     </op_defaults>
>     >   </configuration>
>     > </cib>
>     >
>     > Once i stop the httpd service the pacemaker does not restarts it
>     > automatically.
> 
>     As mentioned, logs help a lot. The logs from all nodes starting before
>     you trigger the failure until after the logs stop printing please.
> 
>     Also, you must use stonith. Please configure and test it. Often problems
>     go away when stonith is configured and working properly.
> 
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca/w/
>     What if the cure for cancer is trapped in the mind of a person without
>     access to education?
> 
>     _______________________________________________
>     Users mailing list: Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>     http://clusterlabs.org/mailman/listinfo/users
> 
>     Project Home: http://www.clusterlabs.org
>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>     Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> -- 
> With Regards
> P.Vijay
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?