[Pacemaker] Speed up resource failover?
Patrick H.
pacemaker at feystorm.net
Wed Jan 12 16:41:31 EST 2011
>> Oh, and its not waiting for the resource to stop on the other node
>> before it starts it up either.
>> Here's the lrmd log for resource vip_55.63 from the 'ha02' node (the
>> node I put into standby)
>> Jan 12 16:10:24 ha02 lrmd: [5180]: info: rsc:vip_55.63:1444: stop
>> Jan 12 16:10:24 ha02 lrmd: [5180]: info: Managed vip_55.63:stop process
>> 19063 exited with return code 0.
>>
>>
>> And here's the lrmd log for the same resource on 'ha01'
>> Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start
>> Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start process
>> 8826 exited with return code 0.
>>
>>
>> Notice that it stopped it a full 36 seconds before it tried to start it
>> on the other node. The times on both boxes are in sync, so its not that
>> either.
>>
>
> Is this the case when you wanted to fail-over a single resource
> or was it part of the node standby process?
>
> Thanks,
>
> Dejan
>
In that case I put the node in standby.
While digging around a bit more, I noticed this:
Jan 12 17:24:56 ha01 crmd: [4710]: info: te_rsc_command: Initiating
action 966: stop vip_55.236_stop_0 on ha01 (local)
Jan 12 17:24:56 ha01 crmd: [4710]: info: do_lrm_rsc_op: Performing
key=966:14345:0:0e860f83-8611-4873-829f-2a0c6fcf6667 op=vip_55.236_stop_0 )
Jan 12 17:24:56 ha01 lrmd: [4707]: info: rsc:vip_55.236:1714: stop
Jan 12 17:24:56 ha01 lrmd: [4707]: info: Managed vip_55.236:stop process
11414 exited with return code 0.
Jan 12 17:24:56 ha01 crmd: [4710]: info: process_lrm_event: LRM
operation vip_55.236_stop_0 (call=1714, rc=0, cib-update=19621,
confirmed=true) ok
Jan 12 17:25:04 ha01 crmd: [4710]: info: match_graph_event: Action
vip_55.236_stop_0 (966) confirmed on ha01 (rc=0)
Jan 12 17:25:04 ha01 crmd: [4710]: info: te_rsc_command: Initiating
action 967: start vip_55.236_start_0 on ha02
Jan 12 17:25:28 ha01 crmd: [4710]: info: match_graph_event: Action
vip_55.236_start_0 (967) confirmed on ha02 (rc=0)
Notice the huge delays before the match_graph_event on both stop and
start. So it seems everything is waiting on match_graph_event. What is this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110112/a057e542/attachment-0001.html>
More information about the Pacemaker
mailing list