[Pacemaker] A couple of queries regarding the behaviour of ocf:heartbeat:ManageVE

Sat Mar 9 12:29:11 EST 2013

Hi,

It looks to me like ocf:heartbeat:ManageVE might do the wrong thing in a
couple of places, so I'd thought I'd check.

The resource agent manages openvz containers (i.e. lightweight virtual
machines AKA "VEs", think chroot++).

The principle potential problem is the stop operation:

The metadata for the resource says:

    <action name="stop" timeout="75" />

... however the stop action does this:

$VZCTL stop $VEID >& /dev/null
  retcode=$?

  if [[ $retcode != 0 ]]; then
    ocf_log err "vzctl stop $VEID returned: $retcode"
    return $OCF_ERR_GENERIC
  fi

  return $OCF_SUCCESS

When the "vzctl stop" operation is stopped, effectively a "shutdown -h
now" command gets run within the container, and the container's init
process attempt to shut down the virtual machines.  If this hasn't
happened within a certain amount of time (currently hard-coded within
vzctl to be 120 seconds), e.g. because a service has hung during
shutdown, or the system is under high load, then the vzctl command gets
more aggressive and forcibly terminates all of the processes within the
container.  Whether the shutdown is "normal" or "forced", the vzctl
return code is still 0 once the container has stopped.  The upshot of
this is that on an unloaded system (circa 2GHz Intel core2), the vzctl
command has a max execution time of approx 122 seconds.

Given the warning at the bottom of:

http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html

It seems to me like <action name="stop" timeout="75" />  (since it
effectively acts as the default stop timeout in pacemaker), is a bit
reckless given the behaviour of "vzctl stop" and it should probably be
120s + some (e.g. 150s).  BTW as far as I can see vzctl stop has behaved
like this for a while.

A more flexible solution might be to make the timeout configurable, but
in the absence of this, then I think upping the stop action timeout
seems like the right thing to do.

The second potential problem (the correct behaviour here is a bit less
clear to me) is with the 'start' command.  It currently starts
containers asynchronously (i.e. using "vzctl start"  instead of "vzctl
start --wait"), and then returns immediately.  The monitor operation
then immediately starts declaring the container to be started, i.e
return OCF_SUCCESS whilst it still starting up, as well as once it has
fully started.

This always-async behaviour effectively defeats any attempt to use
something like batch-limit to throttle simultaneous startup of nodes
(which can obviously be pretty heavy-weight operations - e.g. especially
when the VEs are relatively fat (for instance, simultaneously starting
up all the VEs on one the nodes which I have in mind will make the load
average hit > 100, and could result in the node being fenced for being
unresponsive - this would then make the same thing happen on the other
node - splat).

I'm thinking that ocf:heartbeat:ManageVE should probably default to the
"vzctl start --wait" case.  I'm not sure if the timeout for start should
be raised tho' (maybe it should even be lowered as
http://www.linux-ha.org/doc/dev-guides/_metadata.html states "This is a
hint to the user what minimal timeout should be configured for the
action.").

Any feedback welcome!

Tim.

-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309