[ClusterLabs] successful ipmi stonith still times out
Ron Kerry
rkerry at sgi.com
Thu Dec 17 17:32:42 CET 2015
I have a customer (running SLE 11 SP4 HAE) who is seeing the following
stonith behavior running the ipmi stonith plugin.
Dec 15 14:21:43 test4 pengine[24002]: warning: pe_fence_node: Node
test3 will be fenced because termination was requested
Dec 15 14:21:43 test4 pengine[24002]: warning: determine_online_status:
Node test3 is unclean
Dec 15 14:21:43 test4 pengine[24002]: warning: stage6: Scheduling Node
test3 for STONITH
... it issues the reset and it is noted ...
Dec 15 14:21:45 test4 external/ipmi(STONITH-test3)[177184]: [177197]:
debug: ipmitool output: Chassis Power Control: Reset
Dec 15 14:21:46 test4 stonith-ng[23999]: notice: log_operation:
Operation 'reboot' [177179] (call 2 from crmd.24003) for host 'test3'
with device 'STONITH-test3' returned: 0 (OK)
... test3 does go down ...
Dec 15 14:22:21 test4 kernel: [90153.906461] Cell 2 (test3) left the
membership
... but the stonith operation times out (it said OK earlier) ...
Dec 15 14:22:56 test4 stonith-ng[23999]: notice: remote_op_timeout:
Action reboot (a399a8cb-541a-455e-8d7c-9072d48667d1) for test3
(crmd.24003) timed out
Dec 15 14:23:05 test4 external/ipmi(STONITH-test3)[177667]: [177678]:
debug: ipmitool output: Chassis Power is on
Dec 15 14:23:56 test4 crmd[24003]: error:
stonith_async_timeout_handler: Async call 2 timed out after 132000ms
Dec 15 14:23:56 test4 crmd[24003]: notice: tengine_stonith_callback:
Stonith operation 2/51:100:0:f43dc87c-faf0-4034-8b51-be0c13c95656: Timer
expired (-62)
Dec 15 14:23:56 test4 crmd[24003]: notice: tengine_stonith_callback:
Stonith operation 2 for test3 failed (Timer expired): aborting transition.
Dec 15 14:23:56 test4 crmd[24003]: notice: abort_transition_graph:
Transition aborted: Stonith failed (source=tengine_stonith_callback:697, 0)
This looks like a bug but a quick search did not turn up anything. Does
anyone recognize this problem?
--
Ron Kerry
More information about the Users
mailing list