[ClusterLabs] Live migration problem

Jan Pokorný jpokorny at redhat.com
Thu Oct 6 14:00:02 UTC 2016


On 05/10/16 13:02 -0400, Digimer wrote:
>   I just spent a fair bit of time debugging a weird error, and now that
> I've solved it, I wanted to share it on the list so that it is archived.
> With luck, it will save someone else some heartache. No replies are
> expected. :)
> 
> Environment:
> * Anvil m2 (RHEL 6.8, cman+rgmanager+kvm+drbd+clvmd, fully updated)
> * Guest VM OS - Win2012 R2 64-bit
> 
>   When I tried to live-migrate the server, rgmanager failed with:
> 
> [root at an-a07n02 ~]# clusvcadm -M Windows-Server-2012-R2 -m
> an-a07n02.alteeve.ca
> Trying to migrate service:Windows-Server-2012-R2 to
> an-a07n02.alteeve.ca...Failed; service running on original owner
> 
> /var/log/messages showed:
> ====
> Oct  4 19:15:05 an-a07n01 rgmanager[4213]: Migrating
> vm:Windows-Server-2012-R2 to an-a07n02.alteeve.ca
> Oct  4 19:15:41 an-a07n01 rgmanager[7588]: [vm] Migrate
> Windows-Server-2012-R2 to an-a07n02.alteeve.ca failed:
> Oct  4 19:15:41 an-a07n01 rgmanager[7610]: [vm] error: Unable to read
> from monitor: Connection reset by peer
> Oct  4 19:15:41 an-a07n01 rgmanager[4213]: migrate on vm
> "Windows-Server-2012-R2" returned 150 (unspecified)
> Oct  4 19:15:41 an-a07n01 rgmanager[4213]: Migration of
> vm:Windows-Server-2012-R2 to an-a07n02.alteeve.ca failed; return code 150
> ====
> 
> I disabled the VM in rgmanager, manually booted it using virsh and tried
> to live migrate it directly. Note that I booted the server on node 2
> fine, and was trying to migrate from 2 -> 1. Note also that the
> '--unsafe' is required because nodes using 4kib sector disks can't use
> 'cache="none"' in KVM/qemu (so we set 'write-through', so it is still safe).
> 
> [root at an-a07n02 ~]# virsh migrate --live Windows-Server-2012-R2
> qemu+ssh://an-a07n01.alteeve.ca/system --unsafe
> error: Unable to read from monitor: Connection reset by peer
> 
> In the qemu log file:
> 
> ====
> 2016-10-05 16:11:19.948+0000: starting up
> LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=spice
> /usr/libexec/qemu-kvm -name Windows-Server-2012-R2 -S -M rhel6.6.0 -cpu
> SandyBridge,+erms,+smep,+fsgsbase,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
> -enable-kvm -m 16384 -realtime mlock=off -smp
> 4,sockets=4,cores=1,threads=1 -uuid be69b994-0f70-ccf3-2934-43eb4a4b795b
> -nodefconfig -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/Windows-Server-2012-R2.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=localtime,driftfix=slew -no-reboot -no-shutdown -device
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4
> -device
> ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1
> -device
> ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
> file=/shared/files/Windows_2012_R2_64-bit_eval.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on,format=raw
> -device
> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2
> -drive
> file=/shared/files/virtio-win-0.1.102.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
> -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
> -drive
> file=/dev/an-a07n01_vg0/Windows-Server-2012-R2_0,if=none,id=drive-virtio-disk0,format=raw,cache=writethrough,aio=native
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -drive
> file=/dev/an-a07n01_vg0/Windows-Server-2012-R2_1,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough,aio=native
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
> -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:80:2d:e0,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -chardev
> spicevmc,id=charchannel0,name=vdagent -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
> -device usb-tablet,id=input0 -spice
> port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga
> qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864
> -incoming tcp:[::]:49152 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
> char device redirected to /dev/pts/0
> Features 0x20000250 unsupported. Allowed features: 0x71000454
> qemu: warning: error while loading state for instance 0x0 of device
> '0000:00:06.0/virtio-blk'
> load of migration failed
> 2016-10-05 16:11:31.503+0000: shutting down
> ====
> 
> The key here was "qemu: warning: error while loading state for instance
> 0x0 of device '0000:00:06.0/virtio-blk'".
> 
> There was precious little matching this on google. I could see no
> problems with the XML definition, the backing LVs (two on this VM, the
> LVs are passed up raw to the guest).
> 
> Inside the guest OS, I could see no problems. I could, as mentioned
> above, boot the server on both nodes, but I could not live migrate.
> 
> I got to the point where I started throwing things against the wall out
> of desperation. One of those was to try updating the virtio-block
> drivers on the guest. The guest was built with 0.1.102 virtio stable
> drivers, and the latest stable is now 0.1.126. So I updated the drivers
> in Device Manager and voila! Migration started working.
> 
> We have many Win2012 R2 guests out in production, and many are using the
> .102 drivers. So I have a feeling that it wasn't so much the upgrade
> that made the difference, but instead the reinstall of the drivers.
> 
> I have no idea why this bug happened, but hopefully this might save
> someone some grief in the future if they hit the same.

Wild guess that your issue (feature bit masks fit the picture) got fixed:
https://github.com/YanVugenfirer/kvm-guest-drivers-windows/commit/c6c0158
(via https://github.com/YanVugenfirer/kvm-guest-drivers-windows/pull/35)
likely introduced in ".124".

Nice that you didn't give up :-)

-- 
Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161006/1acfd365/attachment-0002.sig>


More information about the Users mailing list