[Pacemaker] Apparent delay in starting resource during migration

Phil Armstrong pma at sgi.com
Tue Mar 22 10:35:47 EDT 2011


Hi,

I have 4 primitive resources that I was in the process of migrating from 
one node (lotus) to the other (aston). I was using the 'crm resource 
migrate' command (4 times). Normally this process works just fine, but 
in this instance, one of the resources (copan_shelf_07) got 'stuck', in 
that it didn't get started on the target node for about 15 minutes after 
the others (04, 05, 06), and then it seems it was kicked off by 
crm_timer_popped. I'm hoping someone can tell me why copan_shelf_07 took 
so long to be started. I have the complete hb_report if that is 
required. Here are the basics:

Corosync Cluster Engine, version '1.2.6' SVN revision 'exported'
Copyright (c) 2006-2009 Red Hat, Inc.
cluster-glue: 1.0.6 (ba146a145a3ede967af48e8936ac414984aa1e5f)
resource-agents: # Build version: d206bc8f13039b332e76a93a86e8e550b67781da
CRM Version: 1.1.2 (ecb1e2ea172ba2551f0bd763e557fccde68c849b)
pacemaker 1.1.2-0.7.1 - SUSE Linux Enterprise 11 x86_64
libpacemaker3 1.1.2-0.7.1 - SUSE Linux Enterprise 11 x86_64
openais 1.1.3-0.2.3 - SUSE Linux Enterprise 11 x86_64
corosync 1.2.6-0.2.2 - SUSE Linux Enterprise 11 x86_64
libcorosync4 1.2.6-0.2.2 - SUSE Linux Enterprise 11 x86_64
resource-agents 1.0.3-0.4.8 - SUSE Linux Enterprise 11 x86_64
cluster-glue 1.0.6-0.3.7 - SUSE Linux Enterprise 11 x86_64
libglue2 1.0.6-0.3.7 - SUSE Linux Enterprise 11 x86_64
ldirectord 1.0.3-0.4.8 - SUSE Linux Enterprise 11 x86_64

Here are the copan_shelf_07 syslog messages:

Mar 22 07:45:07 aston pengine: [10961]: notice: LogActions: Leave 
resource copan_shelf_07       (Started lotus)
Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff: 
+       <rsc_location id="cli-prefer-copan_shelf_07" 
rsc="copan_shelf_07" __crm_diff_marker__="added:top" >
Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff: 
+         <rule id="cli-prefer-rule-copan_shelf_07" score="INFINITY" 
boolean-op="and" >
Mar 22 07:45:08 aston cib: [10958]: info: log_data_element: cib:diff: 
+           <expression id="cli-prefer-expr-copan_shelf_07" 
attribute="#uname" operation="eq" value="aston" type="string" />
Mar 22 07:45:10 aston pengine: [10961]: notice: native_print: 
copan_shelf_07    (ocf::sgi:copan_ov_client):     Started lotus
Mar 22 07:45:10 aston pengine: [10961]: notice: RecurringOp:  Start 
recurring monitor (30s) for copan_shelf_07 on aston
Mar 22 07:45:10 aston pengine: [10961]: notice: LogActions: Move 
resource copan_shelf_07        (Started lotus -> aston)
Mar 22 07:45:10 aston pengine: [10961]: notice: native_print: 
copan_shelf_07    (ocf::sgi:copan_ov_client):     Started lotus
Mar 22 07:45:10 aston pengine: [10961]: notice: RecurringOp:  Start 
recurring monitor (30s) for copan_shelf_07 on aston
Mar 22 07:45:10 aston pengine: [10961]: notice: LogActions: Move 
resource copan_shelf_07        (Started lotus -> aston)
Mar 22 07:45:10 aston crmd: [10962]: info: te_rsc_command: Initiating 
action 17: stop copan_shelf_07_stop_0 on lotus
Mar 22 07:45:12 aston crmd: [10962]: WARN: status_from_rc: Action 17 
(copan_shelf_07_stop_0) on lotus failed (target: 0 vs. rc: 1): Error
Mar 22 07:45:12 aston crmd: [10962]: WARN: update_failcount: Updating 
failcount for copan_shelf_07 on lotus after failed stop: rc=1 
(update=INFINITY, time=1300797912)
Mar 22 07:45:12 aston crmd: [10962]: info: abort_transition_graph: 
match_graph_event:276 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=copan_shelf_07_stop_0, 
magic=0:1;17:509:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d, cib=0.920.5) : 
Event failed
Mar 22 07:45:12 aston crmd: [10962]: info: match_graph_event: Action 
copan_shelf_07_stop_0 (17) confirmed on lotus (rc=4)
Mar 22 07:45:17 aston pengine: [10961]: WARN: unpack_rsc_op: Processing 
failed op copan_shelf_07_stop_0 on lotus: unknown error (1)
Mar 22 07:45:17 aston pengine: [10961]: ERROR: unpack_rsc_op: Making 
sure copan_shelf_07 doesn't come up again
Mar 22 07:45:17 aston pengine: [10961]: notice: native_print: 
copan_shelf_07    (ocf::sgi:copan_ov_client):     Started lotus FAILED
Mar 22 07:45:17 aston pengine: [10961]: info: get_failcount: 
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 07:45:17 aston pengine: [10961]: WARN: common_apply_stickiness: 
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 07:45:17 aston pengine: [10961]: info: rsc_merge_weights: 
cxfs-client-clone: Rolling back scores from copan_shelf_07
Mar 22 07:45:17 aston pengine: [10961]: info: native_color: Resource 
copan_shelf_07 cannot run anywhere
Mar 22 07:45:17 aston pengine: [10961]: notice: LogActions: Stop 
resource copan_shelf_07        (lotus)
Mar 22 07:45:17 aston pengine: [10961]: WARN: unpack_rsc_op: Processing 
failed op copan_shelf_07_stop_0 on lotus: unknown error (1)
Mar 22 07:45:17 aston pengine: [10961]: ERROR: unpack_rsc_op: Making 
sure copan_shelf_07 doesn't come up again
Mar 22 07:45:17 aston pengine: [10961]: notice: native_print: 
copan_shelf_07    (ocf::sgi:copan_ov_client):     Started lotus FAILED
Mar 22 07:45:17 aston pengine: [10961]: info: get_failcount: 
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 07:45:17 aston pengine: [10961]: WARN: common_apply_stickiness: 
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 07:45:17 aston pengine: [10961]: info: rsc_merge_weights: 
cxfs-client-clone: Rolling back scores from copan_shelf_07
Mar 22 07:45:17 aston pengine: [10961]: info: native_color: Resource 
copan_shelf_07 cannot run anywhere
Mar 22 07:45:17 aston pengine: [10961]: notice: LogActions: Stop 
resource copan_shelf_07        (lotus)
Mar 22 07:45:18 aston crmd: [10962]: info: te_rsc_command: Initiating 
action 11: stop copan_shelf_07_stop_0 on lotus
Mar 22 07:45:19 aston crmd: [10962]: WARN: status_from_rc: Action 11 
(copan_shelf_07_stop_0) on lotus failed (target: 0 vs. rc: 1): Error
Mar 22 07:45:19 aston crmd: [10962]: WARN: update_failcount: Updating 
failcount for copan_shelf_07 on lotus after failed stop: rc=1 
(update=INFINITY, time=1300797919)
Mar 22 07:45:19 aston crmd: [10962]: info: abort_transition_graph: 
match_graph_event:276 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=copan_shelf_07_stop_0, 
magic=0:1;11:511:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d, cib=0.920.16) : 
Event failed
Mar 22 07:45:19 aston crmd: [10962]: info: match_graph_event: Action 
copan_shelf_07_stop_0 (11) confirmed on lotus (rc=4)
Mar 22 07:45:22 aston pengine: [10961]: WARN: unpack_rsc_op: Processing 
failed op copan_shelf_07_stop_0 on lotus: unknown error (1)
Mar 22 07:45:22 aston pengine: [10961]: ERROR: unpack_rsc_op: Making 
sure copan_shelf_07 doesn't come up again
Mar 22 07:45:22 aston pengine: [10961]: notice: native_print: 
copan_shelf_07    (ocf::sgi:copan_ov_client):     Started lotus FAILED
Mar 22 07:45:22 aston pengine: [10961]: info: get_failcount: 
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 07:45:22 aston pengine: [10961]: WARN: common_apply_stickiness: 
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 07:45:22 aston pengine: [10961]: info: rsc_merge_weights: 
cxfs-client-clone: Rolling back scores from copan_shelf_07
Mar 22 07:45:22 aston pengine: [10961]: info: native_color: Resource 
copan_shelf_07 cannot run anywhere
Mar 22 07:45:22 aston pengine: [10961]: notice: LogActions: Stop 
resource copan_shelf_07        (lotus)
Mar 22 07:45:23 aston crmd: [10962]: info: te_rsc_command: Initiating 
action 14: stop copan_shelf_07_stop_0 on lotus
Mar 22 07:45:24 aston crmd: [10962]: info: match_graph_event: Action 
copan_shelf_07_stop_0 (14) confirmed on lotus (rc=0)
Mar 22 08:00:25 aston pengine: [10961]: notice: native_print: 
copan_shelf_07    (ocf::sgi:copan_ov_client):     Stopped
Mar 22 08:00:25 aston pengine: [10961]: info: get_failcount: 
copan_shelf_07 has failed INFINITY times on lotus
Mar 22 08:00:25 aston pengine: [10961]: WARN: common_apply_stickiness: 
Forcing copan_shelf_07 away from lotus after 1000000 failures (max=1)
Mar 22 08:00:25 aston pengine: [10961]: notice: RecurringOp:  Start 
recurring monitor (30s) for copan_shelf_07 on aston
Mar 22 08:00:25 aston pengine: [10961]: notice: LogActions: Start 
copan_shelf_07        (aston)
Mar 22 08:00:25 aston crmd: [10962]: info: te_rsc_command: Initiating 
action 16: start copan_shelf_07_start_0 on aston (local)
Mar 22 08:00:25 aston crmd: [10962]: info: do_lrm_rsc_op: Performing 
key=16:513:0:cebe491c-0397-4ee8-bdd0-ae6ccd1e5c7d 
op=copan_shelf_07_start_0 )
Mar 22 08:00:25 aston lrmd: [10959]: debug: on_msg_perform_op:2359: 
copying parameters for rsc copan_shelf_07
Mar 22 08:00:25 aston lrmd: [10959]: debug: on_msg_perform_op: add an 
operation operation start[142] on ocf::copan_ov_client::copan_shelf_07 
for client 10962, its parameters: CRM_meta_name=[start] 
crm_feature_set=[3.0.2] CRM_meta_on_fail=[restart] 
CRM_meta_timeout=[60000] shelf_name=[c07]  to the operation list.
Mar 22 08:00:25 aston lrmd: [10959]: info: rsc:copan_shelf_07:142: start


Thanks,

Phil

-- 
	Phil Armstrong       pma at sgi.com
	Phone: 651-683-5561  VNET 233-5561





More information about the Pacemaker mailing list