[Pacemaker] staggered resource startup

Andrew Beekhof andrew at beekhof.net
Tue Aug 27 18:18:48 EDT 2013

On 28/08/2013, at 12:20 AM, Matthew O'Connor <matt at ecsorl.com> wrote:

> Hi!
> I have a server that operates about 30 virtual machines.  Normally it
> handles this load very well, but restart can be a bit dicey.  I have
> found that by staggering the vm startups - currently done manually - the
> system handles the growing load much more gracefully.  The sequence goes
> something like this:
> 1. node reboots
> 2. pacemaker (and related) is started
> 3. immediately, all vm resources are stopped (for X in `crm status
> --inactive | grep....`...; do crm resource stop $X...)
> 4. once pacemaker has brought the node online, all vm resource are
> started one at a time (for X...; crm resource start $X; sleep 45s; done)

Are you aware of this cluster option?

       batch-limit = integer [30]
           The number of jobs that the TE is allowed to execute in parallel

           The "correct" value will depend on the speed and load of your network and cluster nodes.

As long as the VMs are truly started before the resource agent reports "done", then lowering this value should spread the load out more.

There is also the possibility to specify ordering constraints with kind=Serialize:


> There's two things I'd like to accomplish, but if I can only get one,
> that would be fine too. 
> First and foremost, I'd like to have Pacemaker stagger the startup of
> certain resources according to a time delay.  Although in the example
> above the node is rebooted, in a two-or-more-node case a single node
> failure might dump a significant number of resources onto the surviving
> nodes, and (more significantly), thereby dumping a huge amount of load
> on the SAN that backs the vm host(s).  Having the vm startups or
> restarts staggered automatically would help mitigate this.  Staggering
> should be relative to other relevant resources.  (Ordering takes care of
> delaying the vms from starting till after the SAN stores mount, but each
> VM should wait a while before another VM kicks off.  A failure of one VM
> to start should not prevent other VMs from starting.)
> Second, I think it would be useful to be able to group the resources
> together for staggered startup.  For instance, most of my vms are linux,
> and they boot very quickly with little load.  Some are Windows, and they
> load the host and SAN very badly on boot.  I would ideally create small
> groups of linux hosts (to be started together) and start the windows
> hosts one at a time (or, another way to think of it, put them in groups
> of one each, so that I'm staggering the groups instead of the individual
> resources).
> A key to making this work will be specifying the delay between starting
> successive vms/groups.  The vm-start command returns from libvirt almost
> immediately, but I want to wait for virtual machine to boot a while -
> something I don't know yet how to easily check for in pacemaker. 
> Although it does seem a little kludgey to put an arbitrary time delay,
> it also appears to be very effective for my situation.
> NB: the groups I describe above have no relationship to groups in the
> classical Pacemaker sense; they don't have to live together, nor is
> there necessarily a hard order of startup or shutdown described.  If one
> resource in a staggering-group fails or is stopped, it has no effect on
> the rest of the group.  There is only the notion that those resources
> should be started together, and started after or before some other group
> of resources + a time delay.  In essence, whereas Pacemaker groups
> describe what to start, I am looking to describe when to start.  I don't
> think stop-staggering has much use here, though I suppose executing
> large batches of stops the same way as staggered-start would prevent the
> vms from all flushing to the SAN at the same time.
> Is there a way to do this with the latest Pacemaker?
> (Sorry this got a bit long-winded...)
> Thanks!!
> -- Matthew
> -- 
> CONFIDENTIAL NOTICE: The information contained in this electronic message is legally privileged, confidential and exempt from disclosure under applicable law. It is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender immediately by return e-mail and delete the original message and any copies of it from your computer system. Thank you.
> EXPORT CONTROL WARNING:  This document may contain technical data that is subject to the International Traffic in Arms Regulations (ITAR) controls and may not be exported or otherwise disclosed to any foreign person or firm, whether in the US or abroad, without first complying with all requirements of the ITAR, 22 CFR 120-130, including the requirement for obtaining an export license if applicable. In addition, this document may contain technology that is subject to the Export Administration Regulations (EAR) and may not be exported or otherwise disclosed to any non-U.S. person, whether in the US or abroad, without first complying with all requirements of the EAR, 15 CFR 730-774, including the requirement for obtaining an export license if applicable. Violation of these export laws is subject to severe criminal penalties.
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130828/f5982393/attachment-0003.sig>

More information about the Pacemaker mailing list