[Pacemaker] stonith pacemaker problem

Shravan Mishra shravan.mishra at gmail.com
Thu Oct 7 17:02:49 EDT 2010


Hi,

Description of my environment:
   corosync=1.2.8
   pacemaker=1.1.3
   Linux= 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP


We are having a problem with our pacemaker which is continuously
canceling the monitoring operation of our stonith devices.

We ran:

stonith -d -t external/safe/ipmi hostname=ha2.itactics.com
ipaddr=192.168.2.7 userid=hellouser passwd=hello interface=lanplus -S

it's output is attached as stonith.output.

We have been trying to debug this issue for  a few days now with no success.
We are hoping that someone can help us as we are under immense
pressure to move to RCS unless we can solve this issue in a day or two
,which I personally don't want to because we like the product.

Any help will be greatly appreciated.


Here is an excerpt from the /var/log/messages:
=========================
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11155: start
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11156: monitor
Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
monitor[11156] on
stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
its parameters: CRM_meta_interval=[20000] target_role=[started]
ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
userid=[safe_ipmi_admin]  cancelled
Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11157: stop
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11158: start
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11159: monitor
Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
monitor[11159] on
stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
its parameters: CRM_meta_interval=[20000] target_role=[started]
ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
userid=[safe_ipmi_admin]  cancelled
Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11160: stop
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11161: start
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11162: monitor
Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
monitor[11162] on
stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
its parameters: CRM_meta_interval=[20000] target_role=[started]
ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
userid=[safe_ipmi_admin]  cancelled
Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11163: stop
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11164: start
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11165: monitor
Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
monitor[11165] on
stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
its parameters: CRM_meta_interval=[20000] target_role=[started]
ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
userid=[safe_ipmi_admin]  cancelled
Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11166: stop
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11167: start
Oct  7 16:58:29 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11168: monitor
Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
monitor[11168] on
stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
its parameters: CRM_meta_interval=[20000] target_role=[started]
ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
userid=[safe_ipmi_admin]  cancelled
Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11169: stop
Oct  7 16:58:30 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11170: start
Oct  7 16:58:30 ha1 lrmd: [3581]: info: stonithRA plugin: got
metadata: <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM
"ra-api-1.dtd"> <resource-agent name="external/safe/ipmi">
<version>1.0</version>   <longdesc lang="en"> ipmitool based power
management. Apparently, the power off method of ipmitool is
intercepted by ACPI which then makes a regular shutdown. If case of a
split brain on a two-node it may happen that no node survives. For
two-node clusters use only the reset method.    </longdesc>
<shortdesc lang="en">IPMI STONITH external device </shortdesc>
<parameters> <parameter name="hostname" unique="1"> <content
type="string" /> <shortdesc lang="en"> Hostname </shortdesc> <longdesc
lang="en"> The name of the host to be managed by this STONITH device.
</longdesc> </parameter>  <parameter name="ipaddr" unique="1">
<content type="string" /> <shortdesc lang="en"> IP Address
</shortdesc> <longdesc lang="en"> The IP address of the STONITH
device. </longdesc> </parameter>  <parameter name="userid" unique="1">
<content type="string" /> <shortdesc lang="en"> Login </shortdesc>
<longdesc lang="en"> The username used for logging in to the STONITH
device. </longdesc> </parameter>  <parameter name="passwd" unique="1">
<content type="string" /> <shortdesc lang="en"> Password </shortdesc>
<longdesc lang="en"> The password used for logging in to the STONITH
device. </longdesc> </parameter>  <parameter name="interface"
unique="1"> <content type="string" default="lan"/> <shortdesc
lang="en"> IPMI interface </shortdesc> <longdesc lang="en"> IPMI
interface to use, such as "lan" or "lanplus". </longdesc> </parameter>
 </parameters>    <actions>     <action name="start"   timeout="15" />
    <action name="stop"    timeout="15" />     <action name="status"
timeout="15" />     <action name="monitor" timeout="15" interval="15"
start-delay="15" />     <action name="meta-data"  timeout="15" />
</actions>   <special tag="heartbeat">     <version>2.0</version>
</special> </resource-agent>
Oct  7 16:58:30 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11171: monitor
Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
monitor[11171] on
stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
its parameters: CRM_meta_interval=[20000] target_role=[started]
ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
userid=[safe_ipmi_admin]  cancelled
Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11172: stop
Oct  7 16:58:30 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11173: start
Oct  7 16:58:30 ha1 lrmd: [3581]: info:
rsc:ha2.itactics.com-stonith:11174: monitor

==========================

Thanks

Shravan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stonith.output
Type: application/octet-stream
Size: 6738 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20101007/a805c1e6/attachment-0002.obj>


More information about the Pacemaker mailing list