[Pacemaker] Speed up resource failover?

Dejan Muhamedagic dejanmm at fastmail.fm
Fri Jan 14 06:45:31 EST 2011


Hi,

On Wed, Jan 12, 2011 at 02:41:31PM -0700, Patrick H. wrote:
> 
> >>Oh, and its not waiting for the resource to stop on the other
> >>node  before it starts it up either.
> >>Here's the lrmd log for resource vip_55.63 from the 'ha02' node
> >>(the  node I put into standby)
> >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop
> >>Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop
> >>process  19063 exited with return code 0.
> >>
> >>
> >>And here's the lrmd log for the same resource on 'ha01'
> >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start
> >>Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start
> >>process  8826 exited with return code 0.
> >>
> >>
> >>Notice that it stopped it a full 36 seconds before it tried to
> >>start it  on the other node. The times on both boxes are in
> >>sync, so its not that  either.
> >
> >Is this the case when you wanted to fail-over a single resource
> >or was it part of the node standby process?
> >
> >Thanks,
> >
> >Dejan
> In that case I put the node in standby.
> 
> 
> While digging around a bit more, I noticed this:
> Jan 12 17:24:56 ha01 crmd: [4710]: info: te_rsc_command: Initiating
> action 966: stop vip_55.236_stop_0 on ha01 (local)
> Jan 12 17:24:56 ha01 crmd: [4710]: info: do_lrm_rsc_op: Performing
> key=966:14345:0:0e860f83-8611-4873-829f-2a0c6fcf6667
> op=vip_55.236_stop_0 )
> Jan 12 17:24:56 ha01 lrmd: [4707]: info: rsc:vip_55.236:1714: stop
> Jan 12 17:24:56 ha01 lrmd: [4707]: info: Managed vip_55.236:stop
> process 11414 exited with return code 0.
> Jan 12 17:24:56 ha01 crmd: [4710]: info: process_lrm_event: LRM
> operation vip_55.236_stop_0 (call=1714, rc=0, cib-update=19621,
> confirmed=true) ok
> Jan 12 17:25:04 ha01 crmd: [4710]: info: match_graph_event: Action
> vip_55.236_stop_0 (966) confirmed on ha01 (rc=0)
> Jan 12 17:25:04 ha01 crmd: [4710]: info: te_rsc_command: Initiating
> action 967: start vip_55.236_start_0 on ha02
> Jan 12 17:25:28 ha01 crmd: [4710]: info: match_graph_event: Action
> vip_55.236_start_0 (967) confirmed on ha02 (rc=0)
> 
> Notice the huge delays before the match_graph_event on both stop and
> start. So it seems everything is waiting on match_graph_event. What
> is this?

Can't say, but perhaps Andrew would know, though I'm not sure if
there's enough information here. Best to open a bugzilla and
attach hb_report.

Thanks,

Dejan

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list