[Pacemaker] Critical: Monitor operation of IPaddr2 timing out, taking more than 60s. Fails to recover.

Parshvi parshvi.17 at gmail.com
Thu Aug 9 01:14:02 EDT 2012


Hi,

The monitor operation of IPaddr2 rsc agent is timing out.
Interval: 5s
Timeout: 60s
The timeout was increased from an earlier 20s to now 60s. Even then, there are 
multiple logs of monitor op. timing out.

1) What can cause the monitor to take so long ?
2) Looking at the pe-input, what contributes to the operation time ? Is it just 
the exec-time or exec-time + queue-time ?
3) Any solution proposed ?

I have lrm pe-input when the timeout was configured at 20s:
Here, is pe-input snapshot where monitor op. timed out (with timeout=20s)

<lrm_resource id="Group_1_ClusterIP" type="IPaddr2" class="ocf" 
provider="heartbeat">
            <lrm_rsc_op id="Group_1_ClusterIP_monitor_0" operation="monitor" 
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.1" transition-
key="28:0:7:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="0:7;28:0:7:6b445452-980a-455f-8616-7bd12f20843e" call-id="10" rc-code="7" 
op-status="0" interval="0" last-run="1343738096" last-rc-change="1343738096" 
exec-time="20" queue-time="30" op-digest="f22a042c86b227078b239707d4e4d4a2"/>

            <lrm_rsc_op id="Group_1_ClusterIP_start_0" operation="start" crm-
debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
key="87:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="0:0;87:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83503" rc-
code="0" op-status="0" interval="0" last-run="1343928908" last-rc-
change="1343928908" exec-time="280" queue-time="20" op-
digest="f22a042c86b227078b239707d4e4d4a2"/>

            <lrm_rsc_op id="Group_1_ClusterIP_monitor_5000" operation="monitor" 
crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
key="12:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="2:-2;12:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83504" rc-
code="-2" op-status="2" interval="5000" last-rc-change="1343928921" exec-
time="20000" queue-time="0" op-digest="79c3bdd01c6e0fd819484536a54bf7a2"/>
(Please note exec-time=20000)

            <lrm_rsc_op id="Group_1_ClusterIP_stop_0" operation="stop" crm-
debug-origin="do_update_resource" crm_feature_set="3.0.1" transition-
key="13:27957:0:6b445452-980a-455f-8616-7bd12f20843e" transition-
magic="0:0;13:27957:0:6b445452-980a-455f-8616-7bd12f20843e" call-id="83497" rc-
code="0" op-status="0" interval="0" last-run="1343928906" last-rc-
change="1343928906" exec-time="1190" queue-time="30" op-
digest="f22a042c86b227078b239707d4e4d4a2"/>
          </lrm_resource>


Please tell me if any other input is required. I would appreciate any early 
help/solution.

Thanks,
Parshvi






More information about the Pacemaker mailing list