[Pacemaker] Surprisingly fast start of resources on cluster failover.

Wed Mar 7 16:08:25 EST 2012

On Tue, Mar 06, 2012 at 01:49:11PM +0100, Florian Crouzat wrote:
> Hi,
> 
> On a two nodes active/passive cluster, I placed a location
> constraint of 50 for #uname node1. As soon as applied, things moved
> from node2 to node1: right.
> I have a lsb init script defined as a resource:
> 
> $ crm configure show firewall
> primitive firewall lsb:firewall\
>         op monitor on-fail="restart" interval="10s" \
>         op start interval="0" timeout="3min" \
>         op stop interval="0" timeout="1min" \
>         meta target-role="Started"
> 
> This lsb takes a long time to start, at least 55 seconds when fired
> from my shell over ssh.
> It logs a couple things to std{out,err}.

If "a couple things" actually happen to be "a lot",
then having stdout/err on tty via ssh in xterm ...
can slow things down.

Did you also time it as
  time /etc/init.d/firewall >out.txt 2>err.txt

> So, while node1 was taking-over, I noticed in
> /var/log/pacemaker/lrmd.log that it only took 24 seconds to start
> that resource.

> My question: how comes pacemaker starts a resources twice as fast
> than I do from CLI ?

Other than above suggestion,
did you verify that it ends up doing the same thing
when started from pacemaker,
compared to when started by you from commandline?
Did you compare the results?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.