[Pacemaker] resource starts but then fails right away

Brian J. Murrell brian at interlinx.bc.ca
Thu May 9 10:26:05 EDT 2013


Using Pacemaker 1.1.7 on EL6.3, I'm getting an intermittent recurrence
of a situation where I add a resource and start it and it seems to
start but then right away fail.  i.e.

# clean up resource before trying to start, just to make sure we start with a clean slate
# crm resource cleanup testfs-resource1
Cleaning up testfs-resource1 on node1

Waiting for 2 replies from the CRMd.. OK

# now try to start it
# crm_resource -r testfs-resource1 -p target-role -m -v Started

# monitor teh start up for success
# crm resource status testfs-resource1:

resource testfs-resource1 is NOT running

# crm resource status testfs-resource1

resource testfs-resource1 is NOT running

# crm resource status testfs-resource1

resource testfs-resource1 is NOT running

...

# crm resource status testfs-resource1

resource testfs-resource1 is NOT running

# crm resource status testfs-resource1

resource testfs-resource1 is NOT running

# crm resource status testfs-resource1
resource testfs-resource1 is running on: node1

# it started.  check once more:

# crm status
============
Last updated: Tue May  7 02:37:34 2013
Last change: Tue May  7 02:36:17 2013 via crm_resource on node1
Stack: openais
Current DC: node1 - partition WITHOUT quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
1 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ node1 ]

 st-fencing	(stonith:fence_foo):	Started node1
 resource2	(ocf::foo:Target):	Started node1
 testfs-resource1	(ocf::foo:Target):	Started node1 FAILED

Failed actions:
    testfs-resource1_monitor_0 (node=node1, call=-1, rc=1, status=Timed Out): unknown error

# but lo and behold, it failed, with a monitor operation failing.

# stop it
# crm_resource -r testfs-resource1 -p target-role -m -v Stopped: 0

The syslog for this whole operation, starting with adding the resource
is as follows:

May  7 02:36:12 node1 cib[16831]:     info: cib:diff: - <cib admin_epoch="0" epoch="15" num_updates="4" />
May  7 02:36:12 node1 crmd[16836]:     info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.16.1) : Non-status change
May  7 02:36:12 node1 crmd[16836]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: + <cib epoch="16" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="node1" update-client="crm_resource" cib-last-written="Tue May  7 02:35:56 2013" have-quorum="0" dc-uuid="node1" >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +   <configuration >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +     <resources >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +       <primitive class="ocf" provider="foo" type="Target" id="testfs-resource1" __crm_diff_marker__="added:top" >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +         <meta_attributes id="testfs-resource1-meta_attributes" >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +           <nvpair name="target-role" id="testfs-resource1-meta_attributes-target-role" value="Stopped" />
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +         </meta_attributes>
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +         <operations id="testfs-resource1-operations" >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +           <op id="testfs-resource1-monitor-5" interval="5" name="monitor" timeout="60" />
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +           <op id="testfs-resource1-start-0" interval="0" name="start" timeout="300" />
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +           <op id="testfs-resource1-stop-0" interval="0" name="stop" timeout="300" />
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +         </operations>
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +         <instance_attributes id="testfs-resource1-instance_attributes" >
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +           <nvpair id="testfs-resource1-instance_attributes-target" name="target" value="364cfbf8-26dc-44c9-98ad-f8f9d0fafd9a" />
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +         </instance_attributes>
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +       </primitive>
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +     </resources>
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: +   </configuration>
May  7 02:36:12 node1 cib[16831]:     info: cib:diff: + </cib>
May  7 02:36:12 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_create for section resources (origin=local/cibadmin/2, version=0.16.1): ok (rc=0)
May  7 02:36:12 node1 pengine[16835]:   notice: unpack_config: On loss of CCM Quorum: Ignore
May  7 02:36:12 node1 crmd[16836]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
May  7 02:36:12 node1 crmd[16836]:     info: do_te_invoke: Processing graph 13 (ref=pe_calc-dc-1367894172-72) derived from /var/lib/pengine/pe-input-124.bz2
May  7 02:36:12 node1 crmd[16836]:     info: te_rsc_command: Initiating action 5: monitor testfs-resource1_monitor_0 on node1 (local)
May  7 02:36:12 node1 lrmd: [16833]: info: rsc:testfs-resource1:8: probe
May  7 02:36:12 node1 pengine[16835]:   notice: process_pe_message: Transition 13: PEngine Input stored in: /var/lib/pengine/pe-input-124.bz2
May  7 02:36:14 node1 crmd[16836]:     info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.17.1) : Non-status change
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: - <cib admin_epoch="0" epoch="16" num_updates="1" />
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: + <cib epoch="17" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="node1" update-client="cibadmin" cib-last-written="Tue May  7 02:36:12 2013" have-quorum="0" dc-uuid="node1" >
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: +   <configuration >
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: +     <constraints >
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: +       <rsc_location id="testfs-resource1-primary" node="node1" rsc="testfs-resource1" score="20" __crm_diff_marker__="added:top" />
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: +     </constraints>
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: +   </configuration>
May  7 02:36:14 node1 cib[16831]:     info: cib:diff: + </cib>
May  7 02:36:14 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_create for section constraints (origin=local/cibadmin/2, version=0.17.1): ok (rc=0)
May  7 02:36:16 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='node1']//lrm_resource[@id='testfs-resource1'] (origin=local/crmd/249, version=0.17.1): ok (rc=0)
May  7 02:36:16 node1 crmd[16836]:     info: delete_resource: Removing resource testfs-resource1 for 18002_crm_resource (internal) on node1
May  7 02:36:16 node1 lrmd: [16833]: info: flush_op: process for operation monitor[8] on ocf::Target::testfs-resource1 for client 16836 still running, flush delayed
May  7 02:36:16 node1 crmd[16836]:     info: lrm_remove_deleted_op: Removing op testfs-resource1_monitor_0:8 for deleted resource testfs-resource1
May  7 02:36:16 node1 crmd[16836]:     info: notify_deleted: Notifying 18002_crm_resource on node1 that testfs-resource1 was deleted
May  7 02:36:16 node1 crmd[16836]:  warning: decode_transition_key: Bad UUID (crm-resource-18002) in sscanf result (3) for 0:0:crm-resource-18002
May  7 02:36:16 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='node1']//lrm_resource[@id='testfs-resource1'] (origin=local/crmd/250, version=0.17.2): ok (rc=0)
May  7 02:36:16 node1 crmd[16836]:     info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.18.1) : Non-status change
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: - <cib admin_epoch="0" epoch="17" num_updates="2" >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -   <configuration >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -     <crm_config >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -       <cluster_property_set id="cib-bootstrap-options" >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -         <nvpair value="1367894155" id="cib-bootstrap-options-last-lrm-refresh" />
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -       </cluster_property_set>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -     </crm_config>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: -   </configuration>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: - </cib>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: + <cib epoch="18" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="node1" update-client="cibadmin" cib-last-written="Tue May  7 02:36:14 2013" have-quorum="0" dc-uuid="node1" >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +   <configuration >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +     <crm_config >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +       <cluster_property_set id="cib-bootstrap-options" >
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +         <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1367894176" />
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +       </cluster_property_set>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +     </crm_config>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: +   </configuration>
May  7 02:36:16 node1 cib[16831]:     info: cib:diff: + </cib>
May  7 02:36:16 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/252, version=0.18.1): ok (rc=0)
May  7 02:36:16 node1 crmd[16836]:     info: ais_dispatch_message: Membership 4: quorum still lost
May  7 02:36:16 node1 crmd[16836]:     info: crmd_ais_dispatch: Setting expected votes to 2
May  7 02:36:16 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/255, version=0.18.2): ok (rc=0)
May  7 02:36:16 node1 crmd[16836]:  warning: do_update_resource: Resource testfs-resource1 no longer exists in the lrmd
May  7 02:36:16 node1 crmd[16836]:     info: process_lrm_event: LRM operation testfs-resource1_monitor_0 (call=8, rc=7, cib-update=0, confirmed=false) not running
May  7 02:36:16 node1 crmd[16836]:     info: update_history_cache: Resource testfs-resource1 no longer exists, not updating cache
May  7 02:36:17 node1 crmd[16836]:     info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.19.1) : Non-status change
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: - <cib admin_epoch="0" epoch="18" num_updates="2" >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -   <configuration >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -     <resources >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -       <primitive id="testfs-resource1" >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -         <meta_attributes id="testfs-resource1-meta_attributes" >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -           <nvpair value="Stopped" id="testfs-resource1-meta_attributes-target-role" />
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -         </meta_attributes>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -       </primitive>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -     </resources>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: -   </configuration>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: - </cib>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: + <cib epoch="19" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="node1" update-client="crmd" cib-last-written="Tue May  7 02:36:16 2013" have-quorum="0" dc-uuid="node1" >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +   <configuration >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +     <resources >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +       <primitive class="ocf" provider="foo" type="Target" id="testfs-resource1" >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +         <meta_attributes id="testfs-resource1-meta_attributes" >
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +           <nvpair name="target-role" id="testfs-resource1-meta_attributes-target-role" value="Started" />
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +         </meta_attributes>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +       </primitive>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +     </resources>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: +   </configuration>
May  7 02:36:17 node1 cib[16831]:     info: cib:diff: + </cib>
May  7 02:36:17 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_modify for section resources (origin=local/crm_resource/4, version=0.19.1): ok (rc=0)
May  7 02:37:32 node1 crmd[16836]:  warning: action_timer_callback: Timer popped (timeout=20000, abort_level=1000000, complete=false)
May  7 02:37:32 node1 crmd[16836]:    error: print_elem: Aborting transition, action lost: [Action 5]: In-flight (id: testfs-resource1_monitor_0, loc: node1, priority: 0)
May  7 02:37:32 node1 crmd[16836]:     info: abort_transition_graph: action_timer_callback:535 - Triggered transition abort (complete=0) : Action lost
May  7 02:37:32 node1 crmd[16836]:  warning: cib_action_update: rsc_op 5: testfs-resource1_monitor_0 on node1 timed out
May  7 02:37:32 node1 crmd[16836]:     info: create_operation_update: cib_action_update: Updating resouce testfs-resource1 after Timed Out monitor op (interval=0)
May  7 02:37:32 node1 crmd[16836]:     info: te_rsc_command: Initiating action 4: probe_complete probe_complete on node1 (local) - no waiting
May  7 02:37:32 node1 crmd[16836]:   notice: run_graph: ==== Transition 13 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-124.bz2): Stopped
May  7 02:37:32 node1 crmd[16836]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
May  7 02:37:32 node1 crmd[16836]:     info: process_graph_event: Action testfs-resource1_monitor_0 arrived after a completed transition
May  7 02:37:32 node1 crmd[16836]:     info: abort_transition_graph: process_graph_event:481 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=testfs-resource1_last_failure_0, magic=2:1;5:13:7:5d728e81-4496-4eac-9f3f-ac6d21163ad7, cib=0.19.2) : Inactive graph
May  7 02:37:32 node1 pengine[16835]:   notice: unpack_config: On loss of CCM Quorum: Ignore
May  7 02:37:32 node1 pengine[16835]:  warning: unpack_rsc_op: Processing failed op testfs-resource1_last_failure_0 on node1: unknown error (1)
May  7 02:37:32 node1 pengine[16835]:   notice: LogActions: Recover testfs-resource1#011(Started node1)
May  7 02:37:32 node1 crmd[16836]:     info: handle_response: pe_calc calculation pe_calc-dc-1367894252-77 is obsolete
May  7 02:37:32 node1 pengine[16835]:   notice: process_pe_message: Transition 14: PEngine Input stored in: /var/lib/pengine/pe-input-125.bz2
May  7 02:37:32 node1 pengine[16835]:   notice: unpack_config: On loss of CCM Quorum: Ignore
May  7 02:37:32 node1 pengine[16835]:  warning: unpack_rsc_op: Processing failed op testfs-resource1_last_failure_0 on node1: unknown error (1)
May  7 02:37:32 node1 pengine[16835]:   notice: LogActions: Recover testfs-resource1#011(Started node1)
May  7 02:37:32 node1 crmd[16836]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
May  7 02:37:32 node1 crmd[16836]:     info: do_te_invoke: Processing graph 15 (ref=pe_calc-dc-1367894252-78) derived from /var/lib/pengine/pe-input-125.bz2
May  7 02:37:32 node1 crmd[16836]:     info: te_rsc_command: Initiating action 2: stop testfs-resource1_stop_0 on node1 (local)
May  7 02:37:32 node1 lrmd: [16833]: info: rsc:testfs-resource1:9: stop
May  7 02:37:32 node1 pengine[16835]:   notice: process_pe_message: Transition 15: PEngine Input stored in: /var/lib/pengine/pe-input-125.bz2
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: - <cib admin_epoch="0" epoch="19" num_updates="2" >
May  7 02:37:34 node1 crmd[16836]:     info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.20.1) : Non-status change
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -   <configuration >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -     <resources >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -       <primitive id="testfs-resource1" >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -         <meta_attributes id="testfs-resource1-meta_attributes" >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -           <nvpair value="Started" id="testfs-resource1-meta_attributes-target-role" />
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -         </meta_attributes>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -       </primitive>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -     </resources>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: -   </configuration>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: - </cib>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: + <cib epoch="20" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.6" update-origin="node1" update-client="crm_resource" cib-last-written="Tue May  7 02:36:17 2013" have-quorum="0" dc-uuid="node1" >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +   <configuration >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +     <resources >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +       <primitive class="ocf" provider="foo" type="Target" id="testfs-resource1" >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +         <meta_attributes id="testfs-resource1-meta_attributes" >
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +           <nvpair name="target-role" id="testfs-resource1-meta_attributes-target-role" value="Stopped" />
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +         </meta_attributes>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +       </primitive>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +     </resources>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: +   </configuration>
May  7 02:37:34 node1 cib[16831]:     info: cib:diff: + </cib>
May  7 02:37:34 node1 cib[16831]:     info: cib_process_request: Operation complete: op cib_modify for section resources (origin=local/crm_resource/4, version=0.20.1): ok (rc=0)
May  7 02:37:36 node1 crmd[16836]:     info: process_lrm_event: LRM operation testfs-resource1_stop_0 (call=9, rc=0, cib-update=259, confirmed=true) ok
May  7 02:37:36 node1 crmd[16836]:   notice: run_graph: ==== Transition 15 (Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/var/lib/pengine/pe-input-125.bz2): Stopped
May  7 02:37:36 node1 crmd[16836]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
May  7 02:37:36 node1 pengine[16835]:   notice: unpack_config: On loss of CCM Quorum: Ignore
May  7 02:37:36 node1 pengine[16835]:  warning: unpack_rsc_op: Processing failed op testfs-resource1_last_failure_0 on node1: unknown error (1)
May  7 02:37:36 node1 crmd[16836]:   notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
May  7 02:37:36 node1 crmd[16836]:     info: do_te_invoke: Processing graph 16 (ref=pe_calc-dc-1367894256-80) derived from /var/lib/pengine/pe-input-126.bz2
May  7 02:37:36 node1 crmd[16836]:   notice: run_graph: ==== Transition 16 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-126.bz2): Complete
May  7 02:37:36 node1 crmd[16836]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
May  7 02:37:36 node1 pengine[16835]:   notice: process_pe_message: Transition 16: PEngine Input stored in: /var/lib/pengine/pe-input-126.bz2

I do see the:

May  7 02:37:32 node1 crmd[16836]:    error: print_elem: Aborting transition, action lost: [Action 5]: In-flight (id: testfs-resource1_monitor_0, loc: node1, priority: 0)

in the log.  Is that the root cause of the problem?  If so, what's that
trying to tell me, exactly?  If not, what is the cause of the problem?

It really can't be the RA timing out since I give the monitor operation
a 60 second timeout and the status action of the RA only take a few
seconds at most to run and is not really an operation that can get
blocked on anything.  It's effectively the grepping of a file.

Cheers,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130509/0cdb172a/attachment-0002.sig>


More information about the Pacemaker mailing list