[Pacemaker] KVM live migration and multipath

Vladislav Bogdanov bubble at hoster-ok.com
Thu Jun 20 01:13:45 EDT 2013


20.06.2013 02:33, Sven Arnold wrote:
> Hi All,
> 
> it asked this on linux-ha but had no luck. Is anybody here who has some
> hints for me or could tell me if it is possible (and sensible) to live
> migrate a virtual machine if the disk image is provided by a multipath
> device?
> 
> I am not sure if my approach is flawed or if I am using the wrong or
> misconfigured tools.
> 
> Thanks a lot and sorry for crossposting,
> 
> Sven
> 
> 
> 
> Dear All,
> 
> I have set up a three node cluster with shared storage (DRBD
> active/passive) which exports iSCSI Volumes (TGT) containing KVM/QEMU
> disk images.
> 
> The iSCSI Target is configured as one resource and accessible on two
> floating ip addresses to allow multipath I/O for speed and redundancy.
> 
> The VM hosts are accessing the volumes via open-isci using dm-multipath
> (grouping_policy multibus).
> 
> While migrating the iSCSI Target from A to B everything works fine.
> But if I try to live migrate a virtual machine I experience file system
> corruptions inside the virtual machine. So, somehow the switching of the
> iSCSI/Multipath Sessions is not handled properly by the VM hosts.

I think the problem should be unrelated to iSCSI, you have correct setup
(of course I did not thoroughly look through all info, but idea is
perfectly correct).

Did you turn caching off for your VMs disks?

> 
> I have configured iSCSI timeouts rather short (noop_out_timeout 5
> seconds) and "no_path_retry queue" on the multipath device.
> 
> My question(s):
> 
> 1) Is it conceptually wrong what I am trying to accomplish?

No, I use almost the same setup in production. Except I use IET and I
have cLVM on top of luns.

> 
> 3) Is it valid to use "no_path_retry queue" in such a setup?

Yes, absolutely.

> 
> 4) Did I miss some important configuration options (timings, etc.)?

As you use 'queue', timings are mostly not important.

> 
> 5) Is TGT multipath capable?

Multipathing is much more an initiator concept, so I cannot see how
target side may affect that (unless it has some serious flaws with
reordering).

> 
> 
> Thank you all for any hints,
> 
> Sven
> 
> ===== Additional Information below: =====
> 
> - Cluster Layout
> - Environment
> - multipath configuration
> - iSCSI Timings
> - cib configuration (simplified and sorted)
> 
> Cluster Layout:
> ---------------
>     A                         B                    C
>  (active) <--- DRBD --->  (passive)
> 
> iSCSI Target
>   ip0 ip1        -- failover-->
> (floating IPs)
> 
> ---------------- iSCSI Initiator  -----------------------
>                    (two pathes)
> 
> ----------------   Multipath I/O  -----------------------
> 
> ----------------    libvirt/KVM   -----------------------
> 
>     <------------------------------ failover ----- VM1
> 
> 
> Environment:
> ------------
> 
> Ubuntu 12.04.2 LTS
> kernel 3.5.0.34
> corosync           1.4.2-2
> cman               3.1.7-0ubuntu2.1
> pacemaker          1.1.6-2ubuntu3
> resource-agents    1:3.9.2-5ubuntu4.1
> tgt                1:1.0.17-1ubuntu2
> open-iscsi         2.0.871-0ubuntu9.12.04.2
> multipath-tools    0.4.9-3ubuntu5
> 
> multipath.conf:
> ---------------
> 
> defaults {
>         udev_dir                /dev
>         polling_interval        10
>         path_selector           "round-robin 0"
>         path_grouping_policy    multibus
>         path_checker            readsector0
>         rr_min_io               100
>         max_fds                 8192
>         rr_weight               priorities
>         failback                immediate
>         no_path_retry           queue
> }
> 
> 
> iscsi timeouts (from /etc/iscsid/iscsi.conf):
> ---------------------------------------------
> 
> node.conn[0].timeo.logout_timeout = 15
> node.conn[0].timeo.login_timeout = 15
> node.conn[0].timeo.auth_timeout = 45
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
> 
> cib configuration (excerpt, slightly modified):
> -----------------------------------------------
> 
> primitive p-drbd-r0 ocf:linbit:drbd \
>         params drbd_resource="r0" \
>         op monitor interval="15"
> ms ms-drbd-r0 p-drbd-r0 \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> primitive p-vg_drbd ocf:heartbeat:LVM \
>         params volgrpname="vg_drbd" \
>         op monitor interval="30s" timeout="30s" \
>         op start interval="0" timeout="30s" \
>         op stop interval="0" timeout="30s"
> primitive p-iscsi-target ocf:heartbeat:iSCSITarget \
>         params iqn="<iscsi iqn>" tid="1" implementation="tgt"
> allowed_initiators="<..initiator ips..>" \
>         op monitor interval="15s"
> primitive p-lun1-vm_disk ocf:heartbeat:iSCSILogicalUnit \
>         params target_iqn="iqn.2013-03.de.localite:storage" lun="1"
> path="/dev/vg_drbd/vm_disk" implementation="tgt" vendor_id="STGT"
> primitive p-iscsiip0 ocf:heartbeat:IPaddr2 \
>         params ip="10.223.101.131" nic="eth2" cidr_netmask="26" \
>         op monitor interval="20s"
> primitive p-iscsiip1 ocf:heartbeat:IPaddr2 \
>         params ip="10.223.101.195" nic="eth3" cidr_netmask="26" \
>         op monitor interval="20s"
> group rg-iscsitarget p-iscsi-target p-lun1-vm_disk p-iscsiip0 p-iscsiip1
> primitive p-iscsi-initiator lsb:open-iscsi \
>         op monitor interval="30s"
> clone clone-iscsiinitiator p-iscsi-initiator \
>         meta interleave="true"
> primitive p-libvirtd lsb:libvirt-bin \
>         op monitor interval="30s"
> clone clone-libvirtd p-libvirtd \
>         meta interleave="true"
> primitive p-vm ocf:heartbeat:VirtualDomain \
>         params config="/etc/libvirt/qemu/vm.xml"
> migration_transport="tls" \
>         meta allow-migrate="true" \
>         op start interval="0" timeout="250s" \
>         op stop interval="0" timeout="300s" \
>         op monitor interval="60s" timeout="30s" \
>         op migrate_from interval="0" timeout="300s" \
>         op migrate_to interval="0" timeout="300s"
> colocation col-iscsitarget_on_drbd inf: rg-iscsitarget ms-drbd-r0:Master
> order o-drbd-r0_before_vg inf: ms-drbd-r0:promote p-vg_drbd:start
> order o-vg-drbd-r0_before_iscsitarget inf: p-vg_drbd rg-iscsitarget
> order o-iscsitarget_before_iscsiinitiator 0: rg-iscsitarget
> clone-iscsiinitiator
> order o-iscsiinitiator_before_libvirt 0: clone-iscsiinitiator
> clone-libvirtd
> order o-libvirt_before_vm inf: clone-libvirtd p-vm
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Pacemaker mailing list