[Pacemaker] Live migration with order constraints

Fri Nov 11 07:59:55 UTC 2011

On 2011-11-11 01:23, Dmitry Golubev wrote:
>> ManageVE has migration support using chkpt/restore since resource-agents
>> version 1.0.4. .... but if I understand the OpenVZ migration concept
>> correct ... please someone correct me if I'm wrong! ... there is no need
>> for a shared storage.
>>
>> The vzmigrate script rsyncs complete data, config and state between
>> nodes .... no shared storage needed.
>>
>> Of course you would need twice the diskspace, but this is also true for
>> DRBD replication. Extending ManageVE to use vzmigrate for live migration
>> looks quite straight forward to me.

Let me chime in here, as I originally added migrate_from and migrate_to
to ManageVE.

> My apologies - I did not notice ManageVE has migrate actions, as its usage help
> does not list them (somebody forgot to add them).

Sorry about that; I've added them now. As a general rule though, the
authoritative documentation for resource agents are always "ocf ra info
<agent>" or the RA man page -- in the ManageVE case, "man
ocf_heartbeat_ManageVE" --, both of which do mention migration. I have,
however, just updated the man page auto-generation so we get an
additional paragraph informing people that the RA supports native migration.

> However it is not so easy as
> it seems. The ManageVE makes a checkpoint and restores the machine (not
> vzmigrate, as I will explain further on), but it also needs a shared or
> migratable storage to place the dumpfile on. So the MigrateVE does have exactly
> the same issue with migration as I mentioned. You can look at the source - it
> has a comment, which says exactly that.

Yes, and there is a simple reason for that: it's much faster than
vzmigrate. The checkpoint and restore can be completed in a matter of
seconds, and the incurred downtime is minimal. In HA configurations,
uptime is something people care about a lot, so it made sense to
implement it that way.

> The vzmigrate script, on the other hand, works a bit differently: it transfers
> the whole virtual machine over the network. Now this approach has three
> obvious drawbacks. First, the need to send huge amount of data over, so it will
> be very very slow (I've seen such migration take hours if the virtual machine
> is very large, say a terabyte), and, moreover, will slowdown the complete disk
> subsystem of the current active node. Second, it will not be live at all, since
> it needs to suspend the machine and synchronize what's left unsynchronized 
> during the first run (all the modifications took place during the first rsync).
> It will also need to recalculate quota, which takes a lot of time as well (for
> a terabyte virtual machine I would estimate quota to calculate up to an hour, 
> depending on the disk subsystem). And third, most importantly, there will be
> zero fault tolerance, as the copy on the second node is not being synchronized
> with the current primary. Now, I do not intend to say that vzmigrate is evil
> or incorrect: it has its purposes, and I've used it to migrate virtual
> machines to new disks (where disks can not be shared) many times, and I was
> very very happy with just how it works... but it is just not suitable for this
> particular purpose.
> 
> A filesystem on an active-passive DRBD, on the other hand, provides full online
> synchronization, so not only the second node could take over once the primary
> failed, but also live migration would be just a matter of dumping the memory
> file, unmounting the filesystem, remounting it on the other node and reading
> the memory file - fast, clean and simple.

Right. So the only thing you're saying is, rather than doing doing vzctl
stop during stop and vzctl chkpnt during migrate_from, you just always
want it to do vzctl chkpnt during stop, too? Well that's something we
can add to the existing RA -- again, no need to roll your own.

Let me know if that's what you want, and then we can discuss how to best
implement it.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now