<html><body><div style="color:#000; background-color:#fff; font-family:arial, helvetica, sans-serif;font-size:10pt"><div id="yiv2130285856"><div><div style="color:#000;background-color:#fff;font-family:arial, helvetica, sans-serif;font-size:10pt;"><div id="yiv2130285856yui_3_2_0_20_134098290286148"><span id="yiv2130285856yui_3_2_0_20_134098290286167">Hi - </span><span><br></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861150"><br><span id="yiv2130285856yui_3_2_0_20_134098290286167"></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861379"><span id="yiv2130285856yui_3_2_0_20_134098290286167">Am new to pacemaker and now have a shiny new configuration that will not stonith. This is a test system using KVM and external/libvirt - all VMs are running CentOS 5.</span><span id="yiv2130285856yui_3_2_0_20_134098290286167"><br><br>Am (really) hoping someone might be willing to help troubleshoot this configuration. Thank you for your
time and effort!<br></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861220"><br>
<div id="yiv2130285856yui_3_2_0_20_1340982902861401"><span id="yiv2130285856yui_3_2_0_20_134098290286167">The items that are suspect to me are:</span></div>
<div id="yiv2130285856yui_3_2_0_20_1340982902861414"><span id="yiv2130285856yui_3_2_0_20_134098290286167">1. st-nodes has no 'location' entry</span></div>
<div id="yiv2130285856yui_3_2_0_20_1340982902861417"><span id="yiv2130285856yui_3_2_0_20_134098290286167">2. logs report node_list=<br>
3. resource st-nodes is Stopped</span></div><br><span id="yiv2130285856yui_3_2_0_20_134098290286167"></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861221"><span id="yiv2130285856yui_3_2_0_20_134098290286167">Have attached a clip of the configuration below. The full configuration and log file may be found at - http://pastebin.com/bS87FXUr</span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861304"><br><span id="yiv2130285856yui_3_2_0_20_134098290286167"></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861305"><span id="yiv2130285856yui_3_2_0_20_134098290286167">Per 'stonith -t external/libvirt -h' I have configured stonith using:</span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861310"><br><span id="yiv2130285856yui_3_2_0_20_134098290286167"></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861311"><span id="yiv2130285856yui_3_2_0_20_134098290286167">primitive st-nodes stonith:external/libvirt
\<br> params hostlist="st15-mds1,st15-mds2,st15-oss1,st15-oss2" hypervisor_uri="qemu+ssh://wc0008/system" stonith-timeout="30" \<br> op start interval="0" timeout="60"
\<br> op stop interval="0" timeout="60" \<br> op monitor interval="60"</span></div><br><div id="yiv2130285856yui_3_2_0_20_1340982902861263"><span id="yiv2130285856yui_3_2_0_20_134098290286167">And a section of the log file is:</span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861296"><br><span id="yiv2130285856yui_3_2_0_20_134098290286167"></span></div><div id="yiv2130285856yui_3_2_0_20_1340982902861297"><span id="yiv2130285856yui_3_2_0_20_134098290286167">Jun 29 11:02:07 st15-mds2 stonithd: [4485]: ERROR: Failed to STONITH the node st15-mds1: optype=RESET, op_result=TIMEOUT<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: tengine_stonith_callback: call=-65, optype=1, node_name=st15-mds1, result=2, node_list=, action=23:90:0:aac961e7-b06b-4dfd-ae60-c882407b16b5<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: ERROR: tengine_stonith_callback: Stonith of st15-mds1 failed
(2)... aborting transition.<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: abort_transition_graph: tengine_stonith_callback:409 - Triggered transition abort (complete=0) : Stonith failed<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: update_abort_priority: Abort
action done superceeded by restart<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: run_graph: ====================================================<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: notice: run_graph: Transition 90 (Complete=2, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pengine/pe-warn-173.bz2): Stopped<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_graph_trigger: Transition 90 is now complete<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: All 3 cluster nodes are eligible to run resources.<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_pe_invoke: Query 299: Requesting the current CIB: S_POLICY_ENGINE<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_pe_invoke_callback: Invoking the PE: query=299,
ref=pe_calc-dc-1340982127-223, seq=396, quorate=1<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: Node st15-mds2 is online<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: pe_fence_node: Node st15-mds1 will be fenced because it is un-expectedly down<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status_fencing: ha_state=active, ccm_state=false, crm_state=online, join_state=member, expected=member<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: determine_online_status: Node st15-mds1 is unclean<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: Node st15-oss1 is online<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: determine_online_status: Node st15-oss2 is online<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice:
native_print: lustre-OST0000 (ocf::heartbeat:Filesystem): Started st15-oss1<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: lustre-OST0001 (ocf::heartbeat:Filesystem): Started st15-oss1<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: lustre-OST0002 (ocf::heartbeat:Filesystem): Started st15-oss2<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: lustre-OST0003 (ocf::heartbeat:Filesystem): Started st15-oss2<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: lustre-MDT0000 (ocf::heartbeat:Filesystem): Started st15-mds1<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: native_print: st-nodes (stonith:external/libvirt): Stopped <br>Jun 29 11:02:07 st15-mds2 pengine:
[4489]: info: native_color: Resource st-nodes cannot run anywhere<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: custom_action: Action lustre-MDT0000_stop_0 on st15-mds1 is unrunnable (offline)<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: custom_action: Marking node st15-mds1 unclean<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: RecurringOp: Start recurring monitor (120s) for lustre-MDT0000 on st15-mds2<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: stage6: Scheduling Node st15-mds1 for STONITH<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: native_stop_constraints: lustre-MDT0000_stop_0 is implicit after st15-mds1 is fenced<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave resource lustre-OST0000 (Started st15-oss1)<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave resource lustre-OST0001 (Started st15-oss1)<br>Jun
29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave resource lustre-OST0002 (Started st15-oss2)<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave resource lustre-OST0003 (Started st15-oss2)<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Move resource lustre-MDT0000 (Started st15-mds1 -> st15-mds2)<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: notice: LogActions: Leave resource st-nodes (Stopped)<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: WARN: process_pe_message: Transition 91: WARNINGs found during PE processing. PEngine Input stored in: /var/lib/pengine/pe-warn-174.bz2<br>Jun
29 11:02:07 st15-mds2 crmd: [4490]: info: unpack_graph: Unpacked transition 91: 7 actions in 7 synapses<br>Jun 29 11:02:07 st15-mds2 pengine: [4489]: info: process_pe_message: Configuration WARNINGs found during PE processing. Please run "crm_verify -L" to identify issues.<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: do_te_invoke: Processing graph 91 (ref=pe_calc-dc-1340982127-223) derived from /var/lib/pengine/pe-warn-174.bz2<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_pseudo_action: Pseudo action 21 fired and confirmed<br>Jun 29 11:02:07 st15-mds2 crmd: [4490]: info: te_fence_node: Executing reboot fencing operation (23) on st15-mds1 (timeout=60000)<br>Jun 29 11:02:07 st15-mds2 stonithd: [4485]: info: client tengine [pid: 4490] requests a STONITH operation RESET on node st15-mds1<br>Jun 29 11:02:07 st15-mds2 stonithd: [4485]: info: we can't manage st15-mds1, broadcast request to other nodes<br>Jun 29 11:02:07 st15-mds2 stonithd:
[4485]: info: Broadcasting the message succeeded: require others to stonith node st15-mds1.<br><br>Thank you!<br></span></div><div id="yiv2130285856yui_3_2_0_20_134098290286151"> </div><div id="yiv2130285856yui_3_2_0_20_134098290286154"><span id="yiv2130285856yui_3_2_0_20_134098290286170" class="yiv2130285856yui_3_2_0_20_134098290286158" style="font-size:10px;font-family:arial, helvetica, sans-serif;">Brett Lee<br>Everything Penguin - <span class="yiv2130285856Apple-tab-span" style="white-space:pre;"></span><a rel="nofollow" target="_blank" href="http://etpenguin.com/">http://etpenguin.com</a><br></span></div></div></div></div></div></body></html>