[Pacemaker] Live migration with order constraints

Florian Haas florian at hastexo.com
Fri Nov 11 07:59:55 UTC 2011

On 2011-11-11 01:23, Dmitry Golubev wrote:
>> ManageVE has migration support using chkpt/restore since resource-agents
>> version 1.0.4. .... but if I understand the OpenVZ migration concept
>> correct ... please someone correct me if I'm wrong! ... there is no need
>> for a shared storage.
>> The vzmigrate script rsyncs complete data, config and state between
>> nodes .... no shared storage needed.
>> Of course you would need twice the diskspace, but this is also true for
>> DRBD replication. Extending ManageVE to use vzmigrate for live migration
>> looks quite straight forward to me.

Let me chime in here, as I originally added migrate_from and migrate_to
to ManageVE.

> My apologies - I did not notice ManageVE has migrate actions, as its usage help
> does not list them (somebody forgot to add them).

Sorry about that; I've added them now. As a general rule though, the
authoritative documentation for resource agents are always "ocf ra info
<agent>" or the RA man page -- in the ManageVE case, "man
ocf_heartbeat_ManageVE" --, both of which do mention migration. I have,
however, just updated the man page auto-generation so we get an
additional paragraph informing people that the RA supports native migration.

> However it is not so easy as
> it seems. The ManageVE makes a checkpoint and restores the machine (not
> vzmigrate, as I will explain further on), but it also needs a shared or
> migratable storage to place the dumpfile on. So the MigrateVE does have exactly
> the same issue with migration as I mentioned. You can look at the source - it
> has a comment, which says exactly that.

Yes, and there is a simple reason for that: it's much faster than
vzmigrate. The checkpoint and restore can be completed in a matter of
seconds, and the incurred downtime is minimal. In HA configurations,
uptime is something people care about a lot, so it made sense to
implement it that way.

> The vzmigrate script, on the other hand, works a bit differently: it transfers
> the whole virtual machine over the network. Now this approach has three
> obvious drawbacks. First, the need to send huge amount of data over, so it will
> be very very slow (I've seen such migration take hours if the virtual machine
> is very large, say a terabyte), and, moreover, will slowdown the complete disk
> subsystem of the current active node. Second, it will not be live at all, since
> it needs to suspend the machine and synchronize what's left unsynchronized 
> during the first run (all the modifications took place during the first rsync).
> It will also need to recalculate quota, which takes a lot of time as well (for
> a terabyte virtual machine I would estimate quota to calculate up to an hour, 
> depending on the disk subsystem). And third, most importantly, there will be
> zero fault tolerance, as the copy on the second node is not being synchronized
> with the current primary. Now, I do not intend to say that vzmigrate is evil
> or incorrect: it has its purposes, and I've used it to migrate virtual
> machines to new disks (where disks can not be shared) many times, and I was
> very very happy with just how it works... but it is just not suitable for this
> particular purpose.
> A filesystem on an active-passive DRBD, on the other hand, provides full online
> synchronization, so not only the second node could take over once the primary
> failed, but also live migration would be just a matter of dumping the memory
> file, unmounting the filesystem, remounting it on the other node and reading
> the memory file - fast, clean and simple.

Right. So the only thing you're saying is, rather than doing doing vzctl
stop during stop and vzctl chkpnt during migrate_from, you just always
want it to do vzctl chkpnt during stop, too? Well that's something we
can add to the existing RA -- again, no need to roll your own.

Let me know if that's what you want, and then we can discuss how to best
implement it.


Need help with High Availability?

More information about the Pacemaker mailing list