<div dir="ltr">Hi,<div style>I've been trying to get fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) working within pacemaker (pacemaker-1.1.8-7.el6.x86_64) but am unable to get it to work as intended, using fence_rhevm on the command line works as expected, as does stonith_admin but from within pacemaker (triggered by deliberately killing corosync on the node to be fenced):</div>
<div style><br></div><div style><div><div>May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed, forming new configuration.</div><div>May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1</div>
<div>
May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or left the membership and a new membership was formed.</div><div>May 21 22:21:34 defiant kernel: dlm: closing connection to node 2</div><div>May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1)</div>
<div>May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service synchronization, ready to provide service.</div><div>May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost</div>
<div>May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match for shutdown action on enterprise</div><div>May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback: Stonith/shutdown of enterprise not matched</div>
<div>May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>May 21 22:21:34 defiant fenced[1302]: fencing node enterprise</div>
<div>May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker fence enterprise (reset)</div><div>May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s</div>
<div>May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback: Sending full refresh (origin=crmd)</div><div>May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)</div>
<div>May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss of CCM Quorum: Ignore</div><div>May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2</div>
<div>May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete</div><div>May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]</div>
<div>May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence enterprise (reset) failed with rc=255</div><div>May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent</div>
<div>May 21 22:21:45 defiant fenced[1302]: fence enterprise failed</div><div>May 21 22:21:48 defiant fenced[1302]: fencing node enterprise</div><div>May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker fence enterprise (reset)</div>
<div>May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s</div><div>May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence enterprise (reset) failed with rc=255</div>
<div>May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent</div><div>May 21 22:21:58 defiant fenced[1302]: fence enterprise failed</div><div>May 21 22:22:01 defiant fenced[1302]: fencing node enterprise</div>
</div><div><br></div><div style>and with corosync.log showing "warning: match_down_event: No match for shutdown action on enterprise", "notice: peer_update_callback: Stonith/shutdown of enterprise not matched"</div>
<div style><br></div><div style><div>May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new configuration.</div><div>May 21 22:21:34 corosync [QUORUM] Members[1]: 1</div><div>May 21 22:21:34 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: cman_event_callback: Membership 296: quorum retained</div><div>May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Left[5.0] cib.2</div>
<div>May 21 22:21:34 [1744] defiant cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline</div><div>May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Member[5.0] cib.1</div>
<div>May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Left[5.0] stonith-ng.2</div><div>May 21 22:21:34 [1745] defiant stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline</div>
<div>May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1)</div><div>May 21 22:21:34 corosync [MAIN ] Completed service synchronization, ready to provide service.</div><div>
May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Member[5.0] stonith-ng.1</div><div>May 21 22:21:34 [1749] defiant crmd: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: enterprise is now lost (was member)</div><div>May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0)</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Left[5.0] crmd.2</div><div>May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: Client enterprise/peer now has status [offline] (DC=true)</div><div>May 21 22:21:34 [1749] defiant crmd: warning: match_down_event: No match for shutdown action on enterprise</div>
<div>May 21 22:21:34 [1749] defiant crmd: notice: peer_update_callback: Stonith/shutdown of enterprise not matched</div><div>May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_expected: peer_update_callback: Node enterprise[2] - expected state is now down</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: peer_update_callback:211 - Triggered transition abort (complete=1) : Node failure</div><div>May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Member[5.0] crmd.1</div>
<div>May 21 22:21:34 [1749] defiant crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: join_make_offer: Making join offers based on membership 296</div><div>May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_offer_all: join-7: Waiting on 1 outstanding join acks</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: update_dc: Set DC to defiant (3.0.7)</div><div>May 21 22:21:34 [1749] defiant crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]</div>
<div>May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_finalize: join-7: Syncing the CIB from defiant to the rest of the cluster</div><div>May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0)</div>
<div>May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0)</div><div>May 21 22:21:34 [1749] defiant crmd: info: stonith_action_create: Initiating action metadata for agent fence_rhevm (target=(null))</div>
<div>May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack: join-7: Updating node state to member for defiant</div><div>May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='defiant']/lrm</div>
<div>May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7): OK (rc=0)</div>
<div>May 21 22:21:35 [1749] defiant crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]</div><div>May 21 22:21:35 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled</div>
<div>May 21 22:21:35 [1747] defiant attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)</div><div>May 21 22:21:35 [1747] defiant attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)</div>
<div>May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0)</div><div>May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/160, version=0.22.11): OK (rc=0)</div>
<div>May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Startup probes: enabled</div><div>May 21 22:21:36 [1748] defiant pengine: notice: unpack_config: On loss of CCM Quorum: Ignore</div><div>
May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0</div><div>May 21 22:21:36 [1748] defiant pengine: info: unpack_domains: Unpacking domains</div>
<div>May 21 22:21:36 [1748] defiant pengine: info: determine_online_status_fencing: Node defiant is active</div><div>May 21 22:21:36 [1748] defiant pengine: info: determine_online_status: Node defiant is online</div>
<div>May 21 22:21:36 [1748] defiant pengine: info: native_print: st-rhevm (stonith:fence_rhevm): Started defiant</div><div>May 21 22:21:36 [1748] defiant pengine: info: LogActions: Leave st-rhevm (Started defiant)</div>
<div>May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2</div><div>May 21 22:21:36 [1749] defiant crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]</div>
<div>May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke: Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from /var/lib/pacemaker/pengine/pe-input-60.bz2</div><div>May 21 22:21:36 [1749] defiant crmd: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete</div>
<div>May 21 22:21:36 [1749] defiant crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]</div></div><div><br></div>
<div><br></div><div style>I can get the node enterprise to fence as expected from the command line with:</div><div style><br></div><div style>stonith_admin --reboot enterprise --tolerance 5s<br></div><div style><br></div>
<div style>fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p <password> -n enterprise -z</div><div style><br></div><div style>My config is as follows:</div><div style><br></div><div style>
cluster.conf -----------------------------------</div><div style><br></div><div style><div><?xml version="1.0"?></div><div><cluster config_version="1" name="cluster"></div><div> <logging debug="off"/></div>
<div> <clusternodes></div><div> <clusternode name="defiant" nodeid="1"></div><div> <fence></div><div> <method name="pcmk-redirect"></div><div> <device name="pcmk" port="defiant"/></div>
<div> </method></div><div> </fence></div><div> </clusternode></div><div> <clusternode name="enterprise" nodeid="2"></div><div> <fence></div><div> <method name="pcmk-redirect"></div>
<div> <device name="pcmk" port="enterprise"/></div><div> </method></div><div> </fence></div><div> </clusternode></div><div> </clusternodes></div>
<div>
<fencedevices> </div><div> <fencedevice name="pcmk" agent="fence_pcmk"/> </div><div> </fencedevices></div><div> <cman two_node="1" expected_votes="1"></div>
<div> </cman></div><div></cluster></div><div><br></div><div style>pacemaker cib ---------------------------------</div><div style><br></div><div style>Stonith device created with:</div><div style><br></div><div style>
pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>" passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1 debug="/tmp/debug.log"<br></div>
<div style><br></div><div style><br></div><div style><div><cib epoch="18" num_updates="88" admin_epoch="0" validate-with="pacemaker-1.2" update-origin="defiant" update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant"></div>
<div> <configuration></div><div> <crm_config></div><div> <cluster_property_set id="cib-bootstrap-options"></div><div> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.el6-394e906"/></div>
<div> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="cman"/></div><div> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/></div>
<div> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/></div><div> </cluster_property_set></div><div> </crm_config></div>
<div> <nodes></div><div> <node id="defiant" uname="defiant"/></div><div> <node id="enterprise" uname="enterprise"/></div><div> </nodes></div>
<div> <resources></div><div> <primitive class="stonith" id="st-rhevm" type="fence_rhevm"></div><div> <instance_attributes id="st-rhevm-instance_attributes"></div>
<div> <nvpair id="st-rhevm-instance_attributes-login" name="login" value="<user>@<domain>"/></div><div> <nvpair id="st-rhevm-instance_attributes-passwd" name="passwd" value="<password>"/></div>
<div> <nvpair id="st-rhevm-instance_attributes-debug" name="debug" value="/tmp/debug.log"/></div><div> <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl" value="1"/></div>
<div> <nvpair id="st-rhevm-instance_attributes-verbose" name="verbose" value="1"/></div><div> <nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr" value="<hypervisor ip>"/></div>
<div> </instance_attributes></div><div> </primitive></div><div> </resources></div><div> <constraints/></div><div> </configuration></div><div> <status></div><div> <node_state id="defiant" uname="defiant" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member"></div>
<div> <transient_attributes id="defiant"></div><div> <instance_attributes id="status-defiant"></div><div> <nvpair id="status-defiant-probe_complete" name="probe_complete" value="true"/></div>
<div> </instance_attributes></div><div> </transient_attributes></div><div> <lrm id="defiant"></div><div> <lrm_resources></div><div> <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith"></div>
<div> <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7" transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332" last-rc-change="0" exec-time="232" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/></div>
<div> </lrm_resource></div><div> </lrm_resources></div><div> </lrm></div><div> </node_state></div><div> <node_state id="enterprise" uname="enterprise" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"></div>
<div> <lrm id="enterprise"></div><div> <lrm_resources></div><div> <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith"></div><div> <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800" last-rc-change="0" exec-time="4" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/></div>
<div> </lrm_resource></div><div> </lrm_resources></div><div> </lrm></div><div> <transient_attributes id="enterprise"></div><div> <instance_attributes id="status-enterprise"></div>
<div> <nvpair id="status-enterprise-probe_complete" name="probe_complete" value="true"/></div><div> </instance_attributes></div><div> </transient_attributes></div>
<div> </node_state></div><div> </status></div><div></cib></div><div><br></div><div><br></div><div style>The debug log output from fence_rhevm doesn't appear to show pacemaker trying to request the reboot, only a vms command sent to the hypervisor which responds with xml listing the VMs.</div>
<div style><br></div><div style>I can't quite see why its failing? Are you aware of any issues with fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?</div>
<div style><br></div><div style>All the best,</div><div style>/John</div></div></div></div></div>