[Pacemaker] Bug? failed to stonith with fence_ipmilan on CentOS6.2

Tue Oct 8 22:53:39 EDT 2013

I think I know why this happened after I enabled 'verbose' for
fence_ipmilan.
When I firstly configure stonith, I set lanplus as true, however, my
machine is not HP one so lanplus is not supported. When I notice this, I
use 'crm configure load update' to update the stonith to set lanplus as
false. And it seems pacemaker accepted this. I think this means stonith-ng
will just use new ipmitool command line since then.
However, the strange behavior is that this configuration never took
effective, even after I restarted the pacemaker service for several times.
What I finally resolved this is that I deleted all configured resource
one-by-one, and then configure the whole stuff again.
P.S. the pacemaker version is pacemaker-cli-1.1.6-3.el6.x86_64, and
fence-agents-3.1.5-10.el6.x86_64
Is it a resolved bug in newer version?
Thanks.

On Wed, Oct 9, 2013 at 5:09 AM, Xiaomin Zhang <zhangxiaomin at gmail.com>wrote:

> Hi:
> I configure stonith on CentOS 6.2 with fence-ipmilan agent:
> primitive node2-stonith stonith:fence_ipmilan \
>         params pcmk_host_list="node2" pcmk_host_check="static-list"
> ipaddr="192.168.170.1" login="root" passwd="123" lanplus="false"
> power_wait="1"
>
> The IPaddr for IPMI and credentials are verified to be correct with raw
> ipmitool command.
>
> While I test the stonith, I just found that the node1-stonith seem not
> working at all, and I also found some strange log on another node which is
> expected to kill node1:
>
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: stonith_fence: Exec
> <stonith_command t="stonith-ng"
> st_async_id="4ca92d0e-9a2a-4fdd-8968-c91eb89e8cbe" st_op="st_fence"
> st_callid="0" st_callopt="0"
> st_remote_op="4ca92d0e-9a2a-4fdd-8968-c91eb89e8cbe" st_target="node2"
> st_device_action="reboot" st_timeout="54000" src="node3" seq="12" />
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info:
> can_fence_host_with_device: node2-stonith can fence node2: static-list
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: stonith_fence: Found 1
> matching devices for 'node2'
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: stonith_command: Processed
> st_fence from node3: rc=-1
> Oct  9 04:39:05 node1 stonith-ng: [3705]: info: make_args: reboot-ing node
> 'node2' as 'port=node2'
> Oct  9 04:39:05 node1 crmd: [3710]: info: send_direct_ack: ACK'ing
> resource op drbd_hadoop:1_notify_0 from
> 77:4:0:ee8de687-92c9-4123-8efb-befd45814a3b: lrm_invoke-lrmd-1381264745-30
> Oct  9 04:39:05 node1 crmd: [3710]: info: process_lrm_event: LRM operation
> drbd_hadoop:1_notify_0 (call=20, rc=0, cib-update=0, confirmed=true) ok
> Oct  9 04:39:05 node1 stonith-ng: [3705]: ERROR: log_operation: Operation
> 'reboot' [22346] (call 0 from (null)) for host 'node2' with device
> 'node2-stonith' returned: -2
> Oct  9 04:39:05 node1 stonith-ng: [3705]: ERROR: log_operation:
> node2-stonith: Rebooting machine @ IPMI:192.168.170.1...Failed
>
> The log shows that stonith failed with return value (-2). However, what
> does this mean? Is there any configure issue?
> Thanks.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20131009/9fb80243/attachment-0003.html>