[Pacemaker] stonith pacemaker problem

Andrew Beekhof andrew at beekhof.net
Sun Oct 10 18:46:38 UTC 2010


Not enough information.
We'd need more than just the lrmd's logs, they only show what happened not why.

On Thu, Oct 7, 2010 at 11:02 PM, Shravan Mishra
<shravan.mishra at gmail.com> wrote:
> Hi,
>
> Description of my environment:
>   corosync=1.2.8
>   pacemaker=1.1.3
>   Linux= 2.6.29.6-0.6.smp.gcc4.1.x86_64 #1 SMP
>
>
> We are having a problem with our pacemaker which is continuously
> canceling the monitoring operation of our stonith devices.
>
> We ran:
>
> stonith -d -t external/safe/ipmi hostname=ha2.itactics.com
> ipaddr=192.168.2.7 userid=hellouser passwd=hello interface=lanplus -S
>
> it's output is attached as stonith.output.
>
> We have been trying to debug this issue for  a few days now with no success.
> We are hoping that someone can help us as we are under immense
> pressure to move to RCS unless we can solve this issue in a day or two
> ,which I personally don't want to because we like the product.
>
> Any help will be greatly appreciated.
>
>
> Here is an excerpt from the /var/log/messages:
> =========================
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11155: start
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11156: monitor
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
> monitor[11156] on
> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
> its parameters: CRM_meta_interval=[20000] target_role=[started]
> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
> userid=[safe_ipmi_admin]  cancelled
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11157: stop
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11158: start
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11159: monitor
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
> monitor[11159] on
> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
> its parameters: CRM_meta_interval=[20000] target_role=[started]
> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
> userid=[safe_ipmi_admin]  cancelled
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11160: stop
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11161: start
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11162: monitor
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
> monitor[11162] on
> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
> its parameters: CRM_meta_interval=[20000] target_role=[started]
> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
> userid=[safe_ipmi_admin]  cancelled
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11163: stop
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11164: start
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11165: monitor
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: cancel_op: operation
> monitor[11165] on
> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
> its parameters: CRM_meta_interval=[20000] target_role=[started]
> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
> userid=[safe_ipmi_admin]  cancelled
> Oct  7 16:58:29 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11166: stop
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11167: start
> Oct  7 16:58:29 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11168: monitor
> Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
> monitor[11168] on
> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
> its parameters: CRM_meta_interval=[20000] target_role=[started]
> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
> userid=[safe_ipmi_admin]  cancelled
> Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11169: stop
> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11170: start
> Oct  7 16:58:30 ha1 lrmd: [3581]: info: stonithRA plugin: got
> metadata: <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM
> "ra-api-1.dtd"> <resource-agent name="external/safe/ipmi">
> <version>1.0</version>   <longdesc lang="en"> ipmitool based power
> management. Apparently, the power off method of ipmitool is
> intercepted by ACPI which then makes a regular shutdown. If case of a
> split brain on a two-node it may happen that no node survives. For
> two-node clusters use only the reset method.    </longdesc>
> <shortdesc lang="en">IPMI STONITH external device </shortdesc>
> <parameters> <parameter name="hostname" unique="1"> <content
> type="string" /> <shortdesc lang="en"> Hostname </shortdesc> <longdesc
> lang="en"> The name of the host to be managed by this STONITH device.
> </longdesc> </parameter>  <parameter name="ipaddr" unique="1">
> <content type="string" /> <shortdesc lang="en"> IP Address
> </shortdesc> <longdesc lang="en"> The IP address of the STONITH
> device. </longdesc> </parameter>  <parameter name="userid" unique="1">
> <content type="string" /> <shortdesc lang="en"> Login </shortdesc>
> <longdesc lang="en"> The username used for logging in to the STONITH
> device. </longdesc> </parameter>  <parameter name="passwd" unique="1">
> <content type="string" /> <shortdesc lang="en"> Password </shortdesc>
> <longdesc lang="en"> The password used for logging in to the STONITH
> device. </longdesc> </parameter>  <parameter name="interface"
> unique="1"> <content type="string" default="lan"/> <shortdesc
> lang="en"> IPMI interface </shortdesc> <longdesc lang="en"> IPMI
> interface to use, such as "lan" or "lanplus". </longdesc> </parameter>
>  </parameters>    <actions>     <action name="start"   timeout="15" />
>    <action name="stop"    timeout="15" />     <action name="status"
> timeout="15" />     <action name="monitor" timeout="15" interval="15"
> start-delay="15" />     <action name="meta-data"  timeout="15" />
> </actions>   <special tag="heartbeat">     <version>2.0</version>
> </special> </resource-agent>
> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11171: monitor
> Oct  7 16:58:30 ha1 lrmd: [3581]: info: cancel_op: operation
> monitor[11171] on
> stonith::external/safe/ipmi::ha2.itactics.com-stonith for client 3584,
> its parameters: CRM_meta_interval=[20000] target_role=[started]
> ipaddr=[192.168.2.7] interface=[lanplus] CRM_meta_timeout=[180000]
> crm_feature_set=[3.0.2] CRM_meta_name=[monitor]
> hostname=[ha2.itactics.com] passwd=[Ft01ST0pMF@]
> userid=[safe_ipmi_admin]  cancelled
> Oct  7 16:58:30 ha1 lrmd: [3581]: info: rsc:ha2.itactics.com-stonith:11172: stop
> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11173: start
> Oct  7 16:58:30 ha1 lrmd: [3581]: info:
> rsc:ha2.itactics.com-stonith:11174: monitor
>
> ==========================
>
> Thanks
>
> Shravan
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>




More information about the Pacemaker mailing list