[Pacemaker] Exec Failure issues.

James Horsfall (CTR) jameshorsfall at stratosgsi.com
Tue Oct 18 14:17:13 EDT 2011


Quick update to this 

 

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+     <resources >

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+       <group id="IPS" >

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+         <primitive id="ETH2" >

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+           <meta_attributes id="ETH2-meta_attributes" >

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+             <nvpair id="ETH2-meta_attributes-is-managed"
name="is-managed" value="true" __crm_diff_marker__="added:top" />

Oct 18 18:11:20 localhost crmd: [2642]: info: abort_transition_graph:
need_abort:59 - Triggered transition abort (complete=1) : Non-status
change

Oct 18 18:11:20 localhost crmd: [2642]: info: need_abort: Aborting on
change to admin_epoch

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+           </meta_attributes>

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+         </primitive>

Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]

Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: All 2
cluster nodes are eligible to run resources.

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+       </group>

Oct 18 18:11:20 localhost crmd: [2642]: info: do_pe_invoke: Query 80:
Requesting the current CIB: S_POLICY_ENGINE

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+     </resources>

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+   </configuration>

Oct 18 18:11:20 localhost cib: [2638]: info: log_data_element: cib:diff:
+ </cib>

Oct 18 18:11:20 localhost cib: [2638]: info: cib_process_request:
Operation complete: op cib_replace for section resources
(origin=local/cibadmin/2, version=0.19.1): ok (rc=0)

Oct 18 18:11:20 localhost crmd: [2642]: info: do_pe_invoke_callback:
Invoking the PE: query=80, ref=pe_calc-dc-1318961480-51, seq=976,
quorate=1

Oct 18 18:11:20 localhost pengine: [2641]: info: unpack_config: Startup
probes: enabled

Oct 18 18:11:20 localhost pengine: [2641]: notice: unpack_config: On
loss of CCM Quorum: Ignore

Oct 18 18:11:20 localhost pengine: [2641]: info: unpack_config: Node
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0

Oct 18 18:11:20 localhost pengine: [2641]: info: unpack_domains:
Unpacking domains

Oct 18 18:11:20 localhost pengine: [2641]: info:
determine_online_status: Node sgn-pau-hub1 is online

Oct 18 18:11:20 localhost pengine: [2641]: ERROR: unpack_rsc_op: Hard
error - ETH2_stop_0 failed with rc=3: Preventing ETH2 from re-starting
on sgn-pau-hub1

Oct 18 18:11:20 localhost pengine: [2641]: WARN: unpack_rsc_op:
Processing failed op ETH2_stop_0 on sgn-pau-hub1: unimplemented feature
(3)

Oct 18 18:11:20 localhost pengine: [2641]: info: native_add_running:
resource ETH2 isnt managed

Oct 18 18:11:20 localhost pengine: [2641]: WARN: unpack_rsc_op:
Processing failed op ETH3_stop_0 on sgn-pau-hub1: unknown exec error
(-2)

Oct 18 18:11:20 localhost pengine: [2641]: info: native_add_running:
resource ETH3 isnt managed

Oct 18 18:11:20 localhost pengine: [2641]: info:
determine_online_status: Node sgn-pau-hub0 is online

Oct 18 18:11:20 localhost pengine: [2641]: notice: group_print:
Resource Group: IPS

Oct 18 18:11:20 localhost pengine: [2641]: notice: native_print:
ETH2#011(ocf::heartbeat:IPaddr):#011Started sgn-pau-hub1 (unmanaged)
FAILED

Oct 18 18:11:20 localhost pengine: [2641]: notice: native_print:
ETH3#011(ocf::heartbeat:IPaddr):#011Started sgn-pau-hub1 (unmanaged)
FAILED

Oct 18 18:11:20 localhost pengine: [2641]: notice: clone_print:  Clone
Set: ping-On-both

Oct 18 18:11:20 localhost pengine: [2641]: notice: short_print:
Started: [ sgn-pau-hub1 sgn-pau-hub0 ]

Oct 18 18:11:20 localhost pengine: [2641]: info: get_failcount: ETH2 has
failed INFINITY times on sgn-pau-hub1

Oct 18 18:11:20 localhost pengine: [2641]: WARN:
common_apply_stickiness: Forcing ETH2 away from sgn-pau-hub1 after
1000000 failures (max=1000000)

Oct 18 18:11:20 localhost pengine: [2641]: info: get_failcount: ETH3 has
failed INFINITY times on sgn-pau-hub1

Oct 18 18:11:20 localhost pengine: [2641]: WARN:
common_apply_stickiness: Forcing ETH3 away from sgn-pau-hub1 after
1000000 failures (max=1000000)

Oct 18 18:11:20 localhost pengine: [2641]: info: native_color: Unmanaged
resource ETH2 allocated to 'nowhere': failed

Oct 18 18:11:20 localhost pengine: [2641]: info: native_color: Unmanaged
resource ETH3 allocated to 'nowhere': failed

Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource ETH2#011(Started unmanaged)

Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource ETH3#011(Started unmanaged)

Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource peth2:0#011(Started sgn-pau-hub1)

Oct 18 18:11:20 localhost pengine: [2641]: notice: LogActions: Leave
resource peth2:1#011(Started sgn-pau-hub0)

Oct 18 18:11:20 localhost pengine: [2641]: WARN: should_dump_input:
Ignoring requirement that ETH2_stop_0 comeplete before IPS_stopped_0:
unmanaged failed resources cannot prevent shutdown

Oct 18 18:11:20 localhost pengine: [2641]: WARN: should_dump_input:
Ignoring requirement that ETH3_stop_0 comeplete before IPS_stopped_0:
unmanaged failed resources cannot prevent shutdown

Oct 18 18:11:20 localhost pengine: [2641]: WARN: should_dump_input:
Ignoring requirement that ETH3_stop_0 comeplete before IPS_stopped_0:
unmanaged failed resources cannot prevent shutdown

Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]

Oct 18 18:11:20 localhost crmd: [2642]: info: unpack_graph: Unpacked
transition 15: 2 actions in 2 synapses

Oct 18 18:11:20 localhost crmd: [2642]: info: do_te_invoke: Processing
graph 15 (ref=pe_calc-dc-1318961480-51) derived from
/var/lib/pengine/pe-input-73.bz2

Oct 18 18:11:20 localhost crmd: [2642]: info: te_pseudo_action: Pseudo
action 15 fired and confirmed

Oct 18 18:11:20 localhost crmd: [2642]: info: te_pseudo_action: Pseudo
action 16 fired and confirmed

Oct 18 18:11:20 localhost crmd: [2642]: info: run_graph:
====================================================

Oct 18 18:11:20 localhost crmd: [2642]: notice: run_graph: Transition 15
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-73.bz2): Complete

Oct 18 18:11:20 localhost crmd: [2642]: info: te_graph_trigger:
Transition 15 is now complete

Oct 18 18:11:20 localhost crmd: [2642]: info: notify_crmd: Transition 15
status: done - <null>

Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]

Oct 18 18:11:20 localhost crmd: [2642]: info: do_state_transition:
Starting PEngine Recheck Timer

Oct 18 18:11:20 localhost pengine: [2641]: info: process_pe_message:
Transition 15: PEngine Input stored in: /var/lib/pengine/pe-input-73.bz2

Oct 18 18:12:02 localhost cib: [2638]: info: cib_stats: Processed 147
operations (5102.00us average, 0% utilization) in the last 10min

 

From: James Horsfall (CTR) [mailto:jameshorsfall at stratosgsi.com] 
Sent: Tuesday, October 18, 2011 1:39 PM
To: pacemaker at oss.clusterlabs.org
Subject: [Pacemaker] Exec Failure issues.

 

Hello all, I'm having some problems getting resources to fail over
properly I need the IP's to swith to a different node when it cannot
ping. We're doing a "shut" on the respective interfaces to simulate
cables being unplugged but I keep getting exec timeouts and unknown
errors. 

crm_mon -fortA

============

Last updated: Tue Oct 18 17:32:10 2011

Stack: openais

Current DC: sgn-pau-hub0 - partition with quorum

Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe

2 Nodes configured, 2 expected votes

2 Resources configured.

============

Online: [ sgn-pau-hub0 sgn-pau-hub1 ]

Full list of resources:

 Resource Group: IPS

     ETH2       (ocf::heartbeat:IPaddr):        Started sgn-pau-hub0
(unmanaged) FAILED

     ETH3       (ocf::heartbeat:IPaddr):        Started sgn-pau-hub0
(unmanaged) FAILED

 Clone Set: ping-On-both

     peth2:1    (ocf::pacemaker:ping):  Started sgn-pau-hub0 FAILED

     Started: [ sgn-pau-hub1 ]

Node Attributes:

* Node sgn-pau-hub0:   #sometimes this says :1000 (degraded)

* Node sgn-pau-hub1:

    + pingd                             : 2000

Operations:

* Node sgn-pau-hub0:

   ETH2: migration-threshold=1000000

    + (5) start: last-rc-change='Tue Oct 18 17:28:11 2011' last-run='Tue
Oct 18 17:28:11 2011' exec-time=100ms queue-time=0ms rc=0 (ok)

    + (7) monitor: interval=30000ms last-rc-change='Tue Oct 18 17:28:11
2011' last-run='Tue Oct 18 17:28:41 2011' exec-time=30ms queue-time=0ms
rc=0 (

ok)

    + (15) stop: last-rc-change='Tue Oct 18 17:30:29 2011' last-run='Tue
Oct 18 17:30:09 2011' exec-time=20000ms queue-time=0ms rc=-2 (unknown
exec er

ror)

   ETH3: migration-threshold=1000000

    + (8) start: last-rc-change='Tue Oct 18 17:28:11 2011' last-run='Tue
Oct 18 17:28:11 2011' exec-time=80ms queue-time=0ms rc=0 (ok)

    + (9) monitor: interval=30000ms last-rc-change='Tue Oct 18 17:28:11
2011' last-run='Tue Oct 18 17:28:41 2011' exec-time=30ms queue-time=0ms
rc=0 (

ok)

    + (12) stop: last-rc-change='Tue Oct 18 17:29:45 2011' last-run='Tue
Oct 18 17:29:25 2011' exec-time=20000ms queue-time=0ms rc=-2 (unknown
exec er

ror)

   peth2:1: migration-threshold=1000000

    + (24) stop: last-rc-change='Tue Oct 18 17:33:10 2011' last-run='Tue
Oct 18 17:33:10 2011' exec-time=10020ms queue-time=0ms rc=0 (ok)

    + (25) start: last-rc-change='Tue Oct 18 17:33:20 2011'
last-run='Tue Oct 18 17:33:20 2011' exec-time=19030ms queue-time=0ms
rc=1 (unknown error)

* Node sgn-pau-hub1:

   peth2:0: migration-threshold=1000000

    + (5) start: last-rc-change='Tue Oct 18 17:26:36 2011' last-run='Tue
Oct 18 17:26:36 2011' exec-time=8070ms queue-time=0ms rc=0 (ok)

    + (6) monitor: interval=10000ms last-rc-change='Tue Oct 18 17:26:45
2011' last-run='Tue Oct 18 17:27:21 2011' exec-time=8030ms
queue-time=0ms rc=0

 (ok)

Failed actions:

    ETH2_stop_0 (node=sgn-pau-hub0, call=15, rc=-2, status=Timed Out):
unknown exec error

    ETH3_stop_0 (node=sgn-pau-hub0, call=12, rc=-2, status=Timed Out):
unknown exec error

    peth2:1_start_0 (node=sgn-pau-hub0, call=25, rc=1, status=complete):
unknown error

-------------------------------------------------------CRM
configuration----------------------------------------------------- 

ode sgn-pau-hub0

node sgn-pau-hub1

primitive ETH2 ocf:heartbeat:IPaddr \

        params ip="10.151.9.42" cidr_netmask="255.255.255.248"
nic="eth2" \

        op monitor interval="30s" timeout="60" \

        meta target-role="Started" allow-migrate="true"

primitive ETH3 ocf:heartbeat:IPaddr \

        params ip="10.151.9.49" cidr_netmask="255.255.255.248"
nic="eth3" \

        op monitor interval="30s" timeout="60" \

        meta target-role="Started" allow-migrate="true"

primitive peth2 ocf:pacemaker:ping \

        params multiplier="1000" host_list="10.151.9.41 10.151.9.50" \

        operations $id="peth2-operations" \

        op monitor interval="10" timeout="20"

group IPS ETH2 ETH3 \

        meta target-role="Started"

clone ping-On-both peth2 \

        meta target-role="Started"

location UPchk IPS \

        rule $id="UPchk-rule" pingd: defined pingd

property $id="cib-bootstrap-options" \

        dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \

        cluster-infrastructure="openais" \

        stonith-enabled="false" \

        default-resource-stickiness="100" \

        no-quorum-policy="ignore" \

        last-lrm-refresh="1318948973" \

        expected-quorum-votes="2"

 

-------------------------------------------- Cib.xml
----------------------------------------------------------------

<?xml version="1.0" ?>

<cib admin_epoch="0" crm_feature_set="3.0.2" dc-uuid="sgn-pau-hub0"
epoch="10" have-quorum="1" num_updates="5"
validate-with="pacemaker-1.2">

  <configuration>

    <crm_config>

      <cluster_property_set id="cib-bootstrap-options">

        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe"/>

        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="openais"/>

        <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>

        <nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="100"/>

        <nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>

        <nvpair id="cib-bootstrap-options-last-lrm-refresh"
name="last-lrm-refresh" value="1318948973"/>

        <nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="2"/>

      </cluster_property_set>

    </crm_config>

    <rsc_defaults/>

    <op_defaults/>

    <nodes>

      <node id="sgn-pau-hub1" type="normal" uname="sgn-pau-hub1"/>

      <node id="sgn-pau-hub0" type="normal" uname="sgn-pau-hub0"/>

    </nodes>

    <resources>

      <clone id="ping-On-both">

        <meta_attributes id="ping-On-both-meta_attributes">

          <nvpair id="ping-On-both-meta_attributes-target-role"
name="target-role" value="Started"/>

        </meta_attributes>

        <primitive class="ocf" id="peth2" provider="pacemaker"
type="ping">

          <instance_attributes id="peth2-instance_attributes">

            <nvpair id="peth2-instance_attributes-multiplier"
name="multiplier" value="1000"/>

            <nvpair id="peth2-instance_attributes-host_list"
name="host_list" value="10.151.9.41 10.151.9.50"/>

          </instance_attributes>

          <operations id="peth2-operations">

            <op id="peth2-monitor-10" interval="10" name="monitor"
timeout="20"/>

          </operations>

        </primitive>

      </clone>

      <group id="IPS">

        <meta_attributes id="IPS-meta_attributes">

          <nvpair id="IPS-meta_attributes-target-role"
name="target-role" value="Started"/>

        </meta_attributes>

        <primitive class="ocf" id="ETH2" provider="heartbeat"
type="IPaddr">

          <instance_attributes id="ETH2-instance_attributes">

            <nvpair id="ETH2-instance_attributes-ip" name="ip"
value="10.151.9.42"/>

            <nvpair id="ETH2-instance_attributes-cidr_netmask"
name="cidr_netmask" value="255.255.255.248"/>

            <nvpair id="ETH2-instance_attributes-nic" name="nic"
value="eth2"/>

          </instance_attributes>

          <operations>

            <op id="ETH2-monitor-30s" interval="30s" name="monitor"
timeout="60"/>

          </operations>

          <meta_attributes id="ETH2-meta_attributes">

            <nvpair id="ETH2-meta_attributes-target-role"
name="target-role" value="Started"/>

            <nvpair id="ETH2-meta_attributes-allow-migrate"
name="allow-migrate" value="true"/>

          </meta_attributes>

        </primitive>

        <primitive class="ocf" id="ETH3" provider="heartbeat"
type="IPaddr">

          <instance_attributes id="ETH3-instance_attributes">

            <nvpair id="ETH3-instance_attributes-ip" name="ip"
value="10.151.9.49"/>

            <nvpair id="ETH3-instance_attributes-cidr_netmask"
name="cidr_netmask" value="255.255.255.248"/>

            <nvpair id="ETH3-instance_attributes-nic" name="nic"
value="eth3"/>

          </instance_attributes>

          <operations>

            <op id="ETH3-monitor-30s" interval="30s" name="monitor"
timeout="60"/>

          </operations>

          <meta_attributes id="ETH3-meta_attributes">

            <nvpair id="ETH3-meta_attributes-target-role"
name="target-role" value="Started"/>

            <nvpair id="ETH3-meta_attributes-allow-migrate"
name="allow-migrate" value="true"/>

          </meta_attributes>

        </primitive>

      </group>

    </resources>

    <constraints>

      <rsc_location id="UPchk" rsc="IPS">

        <rule id="UPchk-rule" score-attribute="pingd">

          <expression attribute="pingd" id="UPchk-expression"
operation="defined"/>

        </rule>

      </rsc_location>

    </constraints>

  </configuration>

</cib>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111018/3b661a3c/attachment-0003.html>


More information about the Pacemaker mailing list