[Pacemaker] Problem in Stonith configuration

Tue Oct 18 02:38:26 EDT 2011

Hello Andreas,

Thanks for the reply.

So can you please suggest what Stonith plugin should I use for the
production release of my software. I have the following system requirements:
1. If a node in the cluster fails, it should be reboot and resources should
re-start on the node.
2. If the physical link between the nodes in a cluster fails then that node
should be isolated (kind of a power down) and the resources should continue
to run on the other nodes.

I have different types of resources e.g. primitive, master-slave and cone
running on my system.

Thanks and regards
Neha Chatrath

Date: Mon, 17 Oct 2011 15:08:16 +0200
From: Andreas Kurz <andreas at hastexo.com>
To: pacemaker at oss.clusterlabs.org
Subject: Re: [Pacemaker] Problem in Stonith configuration
Message-ID: <4E9C28C0.8070904 at hastexo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello,

On 10/17/2011 12:34 PM, neha chatrath wrote:
> Hello,
> I am configuring a 2 node cluster with following configuration:
>
> *[root at MCG1 init.d]# crm configure show
>
> node $id="16738ea4-adae-483f-9d79-
b0ecce8050f4" mcg2 \
> attributes standby="off"
>
> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
> attributes standby="off"
>
> primitive ClusterIP ocf:heartbeat:IPaddr \
> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
>
> op monitor interval="40s" timeout="20s" \
> meta target-role="Started"
>
> primitive app1_fencing stonith:suicide \
> op monitor interval="90" \
> meta target-role="Started"
>
> primitive myapp1 ocf:heartbeat:Redundancy \
> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
>
> primitive myapp2 ocf:mcg:Redundancy_myapp2 \
> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>
> primitive myapp3 ocf:mcg:red_app3 \
> op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>
> ms ms_myapp1 myapp1 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> ms ms_myapp2 myapp2 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> ms ms_myapp3 myapp3 \
> meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> colocation myapp1_col inf: ClusterIP ms_myapp1:Master
>
> colocation myapp2_col inf: ClusterIP ms_myapp2:Master
>
> colocation myapp3_col inf: ClusterIP ms_myapp3:Master
>
> order myapp1_order inf: ms_myapp1:promote ClusterIP:start
>
> order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
>
> order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
>
> property $id="cib-bootstrap-options" \
> dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="true" \
> no-quorum-policy="ignore"
>
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="3"
> *
> I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
> resources (myapp, myapp1 etc) gets started even on this node.
> Following is the output of "*crm_mon -f *" command:
>
> *Last updated: Mon Oct 17 10:19:22 2011
> Stack: Heartbeat
> Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
> quorum
> Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> ============
> Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)

The cluster is waiting for a successful fencing event before starting
all resources .. the only way to be sure the second node runs no resources.

Since you are using suicide pluging this will never happen if Heartbeat
is not started on that node. If this is only a _test_setup_ go with ssh
or even null stonith plugin ... never use them on production systems!

Regards,
Andreas

On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath <nehachatrath at gmail.com>wrote:

> Hello,
> I am configuring a 2 node cluster with following configuration:
>
> *[root at MCG1 init.d]# crm configure show
>
> node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \
> attributes standby="off"
>
> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
> attributes standby="off"
>
> primitive ClusterIP ocf:heartbeat:IPaddr \
> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
>
> op monitor interval="40s" timeout="20s" \
> meta target-role="Started"
>
> primitive app1_fencing stonith:suicide \
> op monitor interval="90" \
> meta target-role="Started"
>
> primitive myapp1 ocf:heartbeat:Redundancy \
> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
>
> primitive myapp2 ocf:mcg:Redundancy_myapp2 \
> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>
> primitive myapp3 ocf:mcg:red_app3 \
> op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>
> ms ms_myapp1 myapp1 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> ms ms_myapp2 myapp2 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> ms ms_myapp3 myapp3 \
> meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> colocation myapp1_col inf: ClusterIP ms_myapp1:Master
>
> colocation myapp2_col inf: ClusterIP ms_myapp2:Master
>
> colocation myapp3_col inf: ClusterIP ms_myapp3:Master
>
> order myapp1_order inf: ms_myapp1:promote ClusterIP:start
>
> order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
>
> order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
>
> property $id="cib-bootstrap-options" \
> dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="true" \
> no-quorum-policy="ignore"
>
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="3"
> *
> I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
> resources (myapp, myapp1 etc) gets started even on this node.
> Following is the output of "*crm_mon -f *" command:
>
> *Last updated: Mon Oct 17 10:19:22 2011
> Stack: Heartbeat
> Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
> quorum
> Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> ============
> Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)
> Online: [ mcg1 ]
> app1_fencing    (stonith:suicide):Started mcg1
>
> Migration summary:
> * Node mcg1:
> *
> When I set "stonith_enabled" as false, then all my resources comes up.
>
> Can somebody help me with STONITH configuration?
>
> Cheers
> Neha Chatrath
>                           KEEP SMILING!!!!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111018/00f8fa81/attachment-0003.html>