[Pacemaker] Speed up resource failover?

Patrick H. pacemaker at feystorm.net
Wed Jan 12 16:41:31 EST 2011


>> Oh, and its not waiting for the resource to stop on the other node  
>> before it starts it up either.
>> Here's the lrmd log for resource vip_55.63 from the 'ha02' node (the  
>> node I put into standby)
>> Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop
>> Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop process  
>> 19063 exited with return code 0.
>>
>>
>> And here's the lrmd log for the same resource on 'ha01'
>> Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start
>> Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start process  
>> 8826 exited with return code 0.
>>
>>
>> Notice that it stopped it a full 36 seconds before it tried to start it  
>> on the other node. The times on both boxes are in sync, so its not that  
>> either.
>>     
>
> Is this the case when you wanted to fail-over a single resource
> or was it part of the node standby process?
>
> Thanks,
>
> Dejan
>   
In that case I put the node in standby.


While digging around a bit more, I noticed this:
Jan 12 17:24:56 ha01 crmd: [4710]: info: te_rsc_command: Initiating 
action 966: stop vip_55.236_stop_0 on ha01 (local)
Jan 12 17:24:56 ha01 crmd: [4710]: info: do_lrm_rsc_op: Performing 
key=966:14345:0:0e860f83-8611-4873-829f-2a0c6fcf6667 op=vip_55.236_stop_0 )
Jan 12 17:24:56 ha01 lrmd: [4707]: info: rsc:vip_55.236:1714: stop
Jan 12 17:24:56 ha01 lrmd: [4707]: info: Managed vip_55.236:stop process 
11414 exited with return code 0.
Jan 12 17:24:56 ha01 crmd: [4710]: info: process_lrm_event: LRM 
operation vip_55.236_stop_0 (call=1714, rc=0, cib-update=19621, 
confirmed=true) ok
Jan 12 17:25:04 ha01 crmd: [4710]: info: match_graph_event: Action 
vip_55.236_stop_0 (966) confirmed on ha01 (rc=0)
Jan 12 17:25:04 ha01 crmd: [4710]: info: te_rsc_command: Initiating 
action 967: start vip_55.236_start_0 on ha02
Jan 12 17:25:28 ha01 crmd: [4710]: info: match_graph_event: Action 
vip_55.236_start_0 (967) confirmed on ha02 (rc=0)

Notice the huge delays before the match_graph_event on both stop and 
start. So it seems everything is waiting on match_graph_event. What is this?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110112/a057e542/attachment-0001.html>


More information about the Pacemaker mailing list