[Pacemaker] fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4

John McCabe john at johnmccabe.net
Wed May 22 11:42:29 EDT 2013


FYI - I've opened a ticket on the RH bugzilla (
https://bugzilla.redhat.com/show_bug.cgi?id=966150) against the
fence_agents component.


On Wed, May 22, 2013 at 12:00 PM, John McCabe <john at johnmccabe.net> wrote:

> No joy with ipport sadly
>
> <nvpair id="st-rhevm-instance_attributes-ipport" name="ipport"
> value="443"/>
> <nvpair id="st-rhevm-instance_attributes-shell_timeout"
> name="shell_timeout" value="10"/>
>
> Can  you share the changes you made to fence_rhevm for the API change?
> I've got what *should* be the latest packages from the HA channel on both
> systems.
>
>
> On Wed, May 22, 2013 at 11:34 AM, Andrew Beekhof <andrew at beekhof.net>wrote:
>
>>
>> On 22/05/2013, at 7:31 PM, John McCabe <john at johnmccabe.net> wrote:
>>
>> > Hi,
>> > I've been trying to get fence_rhevm
>> (fence-agents-3.1.5-25.el6_4.2.x86_64) working within pacemaker
>> (pacemaker-1.1.8-7.el6.x86_64) but am unable to get it to work as intended,
>> using fence_rhevm on the command line works as expected, as does
>> stonith_admin but from within pacemaker (triggered by deliberately killing
>> corosync on the node to be fenced):
>> >
>> > May 21 22:21:32 defiant corosync[1245]:   [TOTEM ] A processor failed,
>> forming new configuration.
>> > May 21 22:21:34 defiant corosync[1245]:   [QUORUM] Members[1]: 1
>> > May 21 22:21:34 defiant corosync[1245]:   [TOTEM ] A processor joined
>> or left the membership and a new membership was formed.
>> > May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
>> > May 21 22:21:34 defiant corosync[1245]:   [CPG   ] chosen downlist:
>> sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
>> > May 21 22:21:34 defiant corosync[1245]:   [MAIN  ] Completed service
>> synchronization, ready to provide service.
>> > May 21 22:21:34 defiant crmd[1749]:   notice: crm_update_peer_state:
>> cman_event_callback: Node enterprise[2] - state is now lost
>> > May 21 22:21:34 defiant crmd[1749]:  warning: match_down_event: No
>> match for shutdown action on enterprise
>> > May 21 22:21:34 defiant crmd[1749]:   notice: peer_update_callback:
>> Stonith/shutdown of enterprise not matched
>> > May 21 22:21:34 defiant crmd[1749]:   notice: do_state_transition:
>> State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
>> cause=C_FSA_INTERNAL origin=check_join_state ]
>> > May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
>> > May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker
>> fence enterprise (reset)
>> > May 21 22:21:34 defiant stonith_admin[2220]:   notice: crm_log_args:
>> Invoked: stonith_admin --reboot enterprise --tolerance 5s
>> > May 21 22:21:35 defiant attrd[1747]:   notice: attrd_local_callback:
>> Sending full refresh (origin=crmd)
>> > May 21 22:21:35 defiant attrd[1747]:   notice: attrd_trigger_update:
>> Sending flush op to all hosts for: probe_complete (true)
>> > May 21 22:21:36 defiant pengine[1748]:   notice: unpack_config: On loss
>> of CCM Quorum: Ignore
>> > May 21 22:21:36 defiant pengine[1748]:   notice: process_pe_message:
>> Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
>> > May 21 22:21:36 defiant crmd[1749]:   notice: run_graph: Transition 64
>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
>> > May 21 22:21:36 defiant crmd[1749]:   notice: do_state_transition:
>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> > May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence
>> enterprise (reset) failed with rc=255
>> > May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent
>> fence_pcmk result: error from agent
>> > May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
>> > May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
>> > May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker
>> fence enterprise (reset)
>> > May 21 22:21:48 defiant stonith_admin[2240]:   notice: crm_log_args:
>> Invoked: stonith_admin --reboot enterprise --tolerance 5s
>> > May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence
>> enterprise (reset) failed with rc=255
>> > May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent
>> fence_pcmk result: error from agent
>> > May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
>> > May 21 22:22:01 defiant fenced[1302]: fencing node enterprise
>> >
>> > and with corosync.log showing "warning: match_down_event:  No match for
>> shutdown action on enterprise", "notice: peer_update_callback:
>>  Stonith/shutdown of enterprise not matched"
>> >
>> > May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new
>> configuration.
>> > May 21 22:21:34 corosync [QUORUM] Members[1]: 1
>> > May 21 22:21:34 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> cman_event_callback:       Membership 296: quorum retained
>> > May 21 22:21:34 [1744] defiant        cib:     info:
>> pcmk_cpg_membership:       Left[5.0] cib.2
>> > May 21 22:21:34 [1744] defiant        cib:     info:
>> crm_update_peer_proc:      pcmk_cpg_membership: Node enterprise[2] -
>> corosync-cpg is now offline
>> > May 21 22:21:34 [1744] defiant        cib:     info:
>> pcmk_cpg_membership:       Member[5.0] cib.1
>> > May 21 22:21:34 [1745] defiant stonith-ng:     info:
>> pcmk_cpg_membership:       Left[5.0] stonith-ng.2
>> > May 21 22:21:34 [1745] defiant stonith-ng:     info:
>> crm_update_peer_proc:      pcmk_cpg_membership: Node enterprise[2] -
>> corosync-cpg is now offline
>> > May 21 22:21:34 corosync [CPG   ] chosen downlist: sender r(0)
>> ip(10.10.25.152) ; members(old:2 left:1)
>> > May 21 22:21:34 corosync [MAIN  ] Completed service synchronization,
>> ready to provide service.
>> > May 21 22:21:34 [1745] defiant stonith-ng:     info:
>> pcmk_cpg_membership:       Member[5.0] stonith-ng.1
>> > May 21 22:21:34 [1749] defiant       crmd:   notice:
>> crm_update_peer_state:     cman_event_callback: Node enterprise[2] - state
>> is now lost
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> peer_update_callback:      enterprise is now lost (was member)
>> > May 21 22:21:34 [1744] defiant        cib:     info:
>> cib_process_request:       Operation complete: op cib_modify for section
>> nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0)
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> pcmk_cpg_membership:       Left[5.0] crmd.2
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> crm_update_peer_proc:      pcmk_cpg_membership: Node enterprise[2] -
>> corosync-cpg is now offline
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> peer_update_callback:      Client enterprise/peer now has status [offline]
>> (DC=true)
>> > May 21 22:21:34 [1749] defiant       crmd:  warning: match_down_event:
>>  No match for shutdown action on enterprise
>> > May 21 22:21:34 [1749] defiant       crmd:   notice:
>> peer_update_callback:      Stonith/shutdown of enterprise not matched
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> crm_update_peer_expected:  peer_update_callback: Node enterprise[2] -
>> expected state is now down
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> abort_transition_graph:    peer_update_callback:211 - Triggered transition
>> abort (complete=1) : Node failure
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> pcmk_cpg_membership:       Member[5.0] crmd.1
>> > May 21 22:21:34 [1749] defiant       crmd:   notice:
>> do_state_transition:       State transition S_IDLE -> S_INTEGRATION [
>> input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> abort_transition_graph:    do_te_invoke:163 - Triggered transition abort
>> (complete=1) : Peer Halt
>> > May 21 22:21:34 [1749] defiant       crmd:     info: join_make_offer:
>> Making join offers based on membership 296
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> do_dc_join_offer_all:      join-7: Waiting on 1 outstanding join acks
>> > May 21 22:21:34 [1749] defiant       crmd:     info: update_dc:
>> Set DC to defiant (3.0.7)
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> do_state_transition:       State transition S_INTEGRATION ->
>> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL
>> origin=check_join_state ]
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> do_dc_join_finalize:       join-7: Syncing the CIB from defiant to the rest
>> of the cluster
>> > May 21 22:21:34 [1744] defiant        cib:     info:
>> cib_process_request:       Operation complete: op cib_sync for section
>> 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0)
>> > May 21 22:21:34 [1744] defiant        cib:     info:
>> cib_process_request:       Operation complete: op cib_modify for section
>> nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0)
>> > May 21 22:21:34 [1749] defiant       crmd:     info:
>> stonith_action_create:     Initiating action metadata for agent fence_rhevm
>> (target=(null))
>> > May 21 22:21:35 [1749] defiant       crmd:     info: do_dc_join_ack:
>>  join-7: Updating node state to member for defiant
>> > May 21 22:21:35 [1749] defiant       crmd:     info: erase_status_tag:
>>  Deleting xpath: //node_state[@uname='defiant']/lrm
>> > May 21 22:21:35 [1744] defiant        cib:     info:
>> cib_process_request:       Operation complete: op cib_delete for section
>> //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7):
>> OK (rc=0)
>> > May 21 22:21:35 [1749] defiant       crmd:     info:
>> do_state_transition:       State transition S_FINALIZE_JOIN ->
>> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL
>> origin=check_join_state ]
>> > May 21 22:21:35 [1749] defiant       crmd:     info:
>> abort_transition_graph:    do_te_invoke:156 - Triggered transition abort
>> (complete=1) : Peer Cancelled
>> > May 21 22:21:35 [1747] defiant      attrd:   notice:
>> attrd_local_callback:      Sending full refresh (origin=crmd)
>> > May 21 22:21:35 [1747] defiant      attrd:   notice:
>> attrd_trigger_update:      Sending flush op to all hosts for:
>> probe_complete (true)
>> > May 21 22:21:35 [1744] defiant        cib:     info:
>> cib_process_request:       Operation complete: op cib_modify for section
>> nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0)
>> > May 21 22:21:35 [1744] defiant        cib:     info:
>> cib_process_request:       Operation complete: op cib_modify for section
>> cib (origin=local/crmd/160, version=0.22.11): OK (rc=0)
>> > May 21 22:21:36 [1748] defiant    pengine:     info: unpack_config:
>> Startup probes: enabled
>> > May 21 22:21:36 [1748] defiant    pengine:   notice: unpack_config:
>> On loss of CCM Quorum: Ignore
>> > May 21 22:21:36 [1748] defiant    pengine:     info: unpack_config:
>> Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> > May 21 22:21:36 [1748] defiant    pengine:     info: unpack_domains:
>>  Unpacking domains
>> > May 21 22:21:36 [1748] defiant    pengine:     info:
>> determine_online_status_fencing:   Node defiant is active
>> > May 21 22:21:36 [1748] defiant    pengine:     info:
>> determine_online_status:   Node defiant is online
>> > May 21 22:21:36 [1748] defiant    pengine:     info: native_print:
>>  st-rhevm        (stonith:fence_rhevm):  Started defiant
>> > May 21 22:21:36 [1748] defiant    pengine:     info: LogActions:
>>  Leave   st-rhevm        (Started defiant)
>> > May 21 22:21:36 [1748] defiant    pengine:   notice:
>> process_pe_message:        Calculated Transition 64:
>> /var/lib/pacemaker/pengine/pe-input-60.bz2
>> > May 21 22:21:36 [1749] defiant       crmd:     info:
>> do_state_transition:       State transition S_POLICY_ENGINE ->
>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>> origin=handle_response ]
>> > May 21 22:21:36 [1749] defiant       crmd:     info: do_te_invoke:
>>  Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from
>> /var/lib/pacemaker/pengine/pe-input-60.bz2
>> > May 21 22:21:36 [1749] defiant       crmd:   notice: run_graph:
>> Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
>> > May 21 22:21:36 [1749] defiant       crmd:   notice:
>> do_state_transition:       State transition S_TRANSITION_ENGINE -> S_IDLE [
>> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>> >
>> >
>> > I can get the node enterprise to fence as expected from the command
>> line with:
>> >
>> > stonith_admin --reboot enterprise --tolerance 5s
>> >
>> > fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p
>> <password> -n enterprise -z
>> >
>> > My config is as follows:
>> >
>> > cluster.conf -----------------------------------
>> >
>> > <?xml version="1.0"?>
>> > <cluster config_version="1" name="cluster">
>> >   <logging debug="off"/>
>> >   <clusternodes>
>> >     <clusternode name="defiant" nodeid="1">
>> >       <fence>
>> >         <method name="pcmk-redirect">
>> >           <device name="pcmk" port="defiant"/>
>> >         </method>
>> >       </fence>
>> >     </clusternode>
>> >     <clusternode name="enterprise" nodeid="2">
>> >       <fence>
>> >         <method name="pcmk-redirect">
>> >           <device name="pcmk" port="enterprise"/>
>> >         </method>
>> >       </fence>
>> >     </clusternode>
>> >   </clusternodes>
>> >   <fencedevices>
>> >     <fencedevice name="pcmk" agent="fence_pcmk"/>
>> >   </fencedevices>
>> >   <cman two_node="1" expected_votes="1">
>> >   </cman>
>> > </cluster>
>> >
>> > pacemaker cib ---------------------------------
>> >
>> > Stonith device created with:
>> >
>> > pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>"
>> passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1
>> debug="/tmp/debug.log"
>> >
>> >
>> > <cib epoch="18" num_updates="88" admin_epoch="0"
>> validate-with="pacemaker-1.2" update-origin="defiant"
>> update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013"
>> crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
>> >   <configuration>
>> >     <crm_config>
>> >       <cluster_property_set id="cib-bootstrap-options">
>> >         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>> value="1.1.8-7.el6-394e906"/>
>> >         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>> name="cluster-infrastructure" value="cman"/>
>> >         <nvpair id="cib-bootstrap-options-no-quorum-policy"
>> name="no-quorum-policy" value="ignore"/>
>> >         <nvpair id="cib-bootstrap-options-stonith-enabled"
>> name="stonith-enabled" value="true"/>
>> >       </cluster_property_set>
>> >     </crm_config>
>> >     <nodes>
>> >       <node id="defiant" uname="defiant"/>
>> >       <node id="enterprise" uname="enterprise"/>
>> >     </nodes>
>> >     <resources>
>> >       <primitive class="stonith" id="st-rhevm" type="fence_rhevm">
>> >         <instance_attributes id="st-rhevm-instance_attributes">
>> >           <nvpair id="st-rhevm-instance_attributes-login" name="login"
>> value="<user>@<domain>"/>
>> >           <nvpair id="st-rhevm-instance_attributes-passwd"
>> name="passwd" value="<password>"/>
>> >           <nvpair id="st-rhevm-instance_attributes-debug" name="debug"
>> value="/tmp/debug.log"/>
>> >           <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl"
>> value="1"/>
>> >           <nvpair id="st-rhevm-instance_attributes-verbose"
>> name="verbose" value="1"/>
>> >           <nvpair id="st-rhevm-instance_attributes-ipaddr"
>> name="ipaddr" value="<hypervisor ip>"/>
>> >         </instance_attributes>
>> >       </primitive>
>>
>> Mine is:
>>
>>       <primitive id="Fencing" class="stonith" type="fence_rhevm">
>>         <instance_attributes id="Fencing-params">
>>           <nvpair id="Fencing-ipport" name="ipport" value="443"/>
>>           <nvpair id="Fencing-shell_timeout" name="shell_timeout"
>> value="10"/>
>>           <nvpair id="Fencing-passwd" name="passwd" value="{pass}"/>
>>           <nvpair id="Fencing-ipaddr" name="ipaddr" value="{ip}"/>
>>           <nvpair id="Fencing-ssl" name="ssl" value="1"/>
>>           <nvpair id="Fencing-login" name="login" value="{user}@
>> {domain}"/>
>>         </instance_attributes>
>>         <operations>
>>           <op id="Fencing-monitor-120s" interval="120s" name="monitor"
>> timeout="120s"/>
>>           <op id="Fencing-stop-0" interval="0" name="stop" timeout="60s"/>
>>           <op id="Fencing-start-0" interval="0" name="start"
>> timeout="60s"/>
>>         </operations>
>>       </primitive>
>>
>> Maybe ipport is important?
>> Also, there was a RHEVM API change recently, I had to update the
>> fence_rhevm agent before it would work again.
>>
>> >     </resources>
>> >     <constraints/>
>> >   </configuration>
>> >   <status>
>> >     <node_state id="defiant" uname="defiant" in_ccm="true"
>> crmd="online" crm-debug-origin="do_state_transition" join="member"
>> expected="member">
>> >       <transient_attributes id="defiant">
>> >         <instance_attributes id="status-defiant">
>> >           <nvpair id="status-defiant-probe_complete"
>> name="probe_complete" value="true"/>
>> >         </instance_attributes>
>> >       </transient_attributes>
>> >       <lrm id="defiant">
>> >         <lrm_resources>
>> >           <lrm_resource id="st-rhevm" type="fence_rhevm"
>> class="stonith">
>> >             <lrm_rsc_op id="st-rhevm_last_0"
>> operation_key="st-rhevm_start_0" operation="start"
>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
>> transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
>> transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
>> call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332"
>> last-rc-change="0" exec-time="232" queue-time="0"
>> op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
>> >           </lrm_resource>
>> >         </lrm_resources>
>> >       </lrm>
>> >     </node_state>
>> >     <node_state id="enterprise" uname="enterprise" in_ccm="true"
>> crmd="online" crm-debug-origin="do_update_resource" join="member"
>> expected="member">
>> >       <lrm id="enterprise">
>> >         <lrm_resources>
>> >           <lrm_resource id="st-rhevm" type="fence_rhevm"
>> class="stonith">
>> >             <lrm_rsc_op id="st-rhevm_last_0"
>> operation_key="st-rhevm_monitor_0" operation="monitor"
>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
>> transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
>> transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
>> call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800"
>> last-rc-change="0" exec-time="4" queue-time="0"
>> op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
>> >           </lrm_resource>
>> >         </lrm_resources>
>> >       </lrm>
>> >       <transient_attributes id="enterprise">
>> >         <instance_attributes id="status-enterprise">
>> >           <nvpair id="status-enterprise-probe_complete"
>> name="probe_complete" value="true"/>
>> >         </instance_attributes>
>> >       </transient_attributes>
>> >     </node_state>
>> >   </status>
>> > </cib>
>> >
>> >
>> > The debug log output from fence_rhevm doesn't appear to show pacemaker
>> trying to request the reboot, only a vms command sent to the hypervisor
>> which responds with xml listing the VMs.
>> >
>> > I can't quite see why its failing? Are you aware of any issues with
>> fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with
>> pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?
>> >
>> > All the best,
>> > /John
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130522/64857c2e/attachment-0003.html>


More information about the Pacemaker mailing list