[Pacemaker] Bug? failed to stonith with fence_ipmilan on CentOS6.2

Tue Oct 15 22:16:24 UTC 2013

On 09/10/2013, at 1:53 PM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:

> I think I know why this happened after I enabled 'verbose' for fence_ipmilan. 
> When I firstly configure stonith, I set lanplus as true, however, my machine is not HP one so lanplus is not supported. When I notice this, I use 'crm configure load update' to update the stonith to set lanplus as false. And it seems pacemaker accepted this. I think this means stonith-ng will just use new ipmitool command line since then.
> However, the strange behavior is that this configuration never took effective, even after I restarted the pacemaker service for several times.

Thats quite odd, I've never heard that before.

> What I finally resolved this is that I deleted all configured resource one-by-one, and then configure the whole stuff again.
> P.S. the pacemaker version is pacemaker-cli-1.1.6-3.el6.x86_64, and fence-agents-3.1.5-10.el6.x86_64
> Is it a resolved bug in newer version?

Highly likely

> Thanks.
> 
> 
> 
> On Wed, Oct 9, 2013 at 5:09 AM, Xiaomin Zhang <zhangxiaomin at gmail.com> wrote:
> Hi:
> I configure stonith on CentOS 6.2 with fence-ipmilan agent:
> primitive node2-stonith stonith:fence_ipmilan \
>         params pcmk_host_list="node2" pcmk_host_check="static-list" ipaddr="192.168.170.1" login="root" passwd="123" lanplus="false" power_wait="1"
> 
> The IPaddr for IPMI and credentials are verified to be correct with raw ipmitool command.
> 
> While I test the stonith, I just found that the node1-stonith seem not working at all, and I also found some strange log on another node which is expected to kill node1:
> 
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: stonith_fence: Exec <stonith_command t="stonith-ng" st_async_id="4ca92d0e-9a2a-4fdd-8968-c91eb89e8cbe" st_op="st_fence" st_callid="0" st_callopt="0" st_remote_op="4ca92d0e-9a2a-4fdd-8968-c91eb89e8cbe" st_target="node2" st_device_action="reboot" st_timeout="54000" src="node3" seq="12" />
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: can_fence_host_with_device: node2-stonith can fence node2: static-list
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: stonith_fence: Found 1 matching devices for 'node2'
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: stonith_command: Processed st_fence from node3: rc=-1
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: make_args: reboot-ing node 'node2' as 'port=node2'
> Oct  9 04:39:05 node1 crmd: [3710]: info: send_direct_ack: ACK'ing resource op drbd_hadoop:1_notify_0 from 77:4:0:ee8de687-92c9-4123-8efb-befd45814a3b: lrm_invoke-lrmd-1381264745-30
> Oct  9 04:39:05 node1 crmd: [3710]: info: process_lrm_event: LRM operation drbd_hadoop:1_notify_0 (call=20, rc=0, cib-update=0, confirmed=true) ok
> Oct  9 04:39:05 node1 stonith-ng: [3705]: ERROR: log_operation: Operation 'reboot' [22346] (call 0 from (null)) for host 'node2' with device 'node2-stonith' returned: -2
> Oct  9 04:39:05 node1 stonith-ng: [3705]: ERROR: log_operation: node2-stonith: Rebooting machine @ IPMI:192.168.170.1...Failed
> 
> The log shows that stonith failed with return value (-2). However, what does this mean? Is there any configure issue?
> Thanks.
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131016/3b14f961/attachment-0004.sig>