[ClusterLabs] Error at testing live migration

Wilson Acero rasalax at hotmail.com
Fri Mar 27 21:40:18 UTC 2015


Hi Ken, thanks for your answer. Before making the live migration tests I ran tests to see how Pacemaker manages the virtual machine shutdown. Using the command "pcs cluster standby nodoX" there were no errors, but rebooting or shutting down the node, the virtualmachine resource, gfs2wa and iscsi resources failed and the node became UNCLEAN. After a lot of tests I did modified my /usr/lib/systemd/system/corosync.service file, and add this entries.

After=iscsid.service
After=remote-fs.target
After=libvirtd.service

It solved the shutting down /reboot error, giving Pacemaker enough time to shutting down the virtual machine, restarting it on another node, and continue with the rebooting of the node, but when testing the live migration, it fails. 

I added your modification on /usr/lib/systemd/system/pacemaker.service, but it did not work. 

Searching about this error I found out that systemd now includes the service "systemd-machined.service" a service to monitor, start or shut down a virtual machine using the command machinectl. I tried to disable the process but libvirt needs it to run a virtual machine.

[root at nodo3 system]# machinectl
MACHINE                          CONTAINER SERVICE
qemu-centos2                     vm        libvirt-qemu

1 machines listed.
[root at nodo3 system]#
[root at nodo3 system]# systemctl status systemd-machined.service
systemd-machined.service - Virtual Machine and Container Registration Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-machined.service; static)
   Active: active (running) since Fri 2015-03-27 16:13:20 ECT; 22min ago
     Docs: man:systemd-machined.service(8)
           http://www.freedesktop.org/wiki/Software/systemd/machined
 Main PID: 2982 (systemd-machine)
   CGroup: /system.slice/systemd-machined.service
           ââ2982 /usr/lib/systemd/systemd-machined

Mar 27 16:13:20 nodo3.redwa.local systemd[1]: Starting Virtual Machine and Container Registration Service...
Mar 27 16:13:20 nodo3.redwa.local systemd[1]: Started Virtual Machine and Container Registration Service.
Mar 27 16:13:20 nodo3.redwa.local systemd-machined[2982]: New machine qemu-centos2.

I guess that service is guilty, but I don't know how to deal with it. 

Thanks a lot. 

From: rasalax at hotmail.com
To: users at clusterlabs.org
Subject: Error at testing live migration
Date: Fri, 27 Mar 2015 12:46:47 -0500




Hi everybody, 
I have a pacemaker + corosync cluster that manages a virtual machine (kvm) the virtual machine drives are stored in  a shared storage (gfs2 + lvm+ iscsi LUN). The resource agent is VirtualDomain. 
When I test the live migration with a command  'pcs resource move vmcentos2 nodo2' or putting the node on standby, the migration works with no problem. 
But when I want to test the live migration rebooting or  shutting down the node that runs the virtual machine, migration fails. Is this a expected behaviour or a bug?
My cluster configuration is:
OS=Centos 7 Pacemaker 1.1.10-32.el7_0.1Corosync Cluster Engine, version '2.3.3'
[root at nodo2 ~]# pcs statusCluster name: clusterwaLast updated: Fri Mar 27 12:20:04 2015Last change: Thu Mar 26 16:11:11 2015 via crm_resource on nodo2Stack: corosyncCurrent DC: nodo2 (2) - partition with quorumVersion: 1.1.10-32.el7_0.1-368c7265 Nodes configured29 Resources configured
Online: [ nodo2 nodo3 nodo4 ]Containers: [ centos1.7:vmcentos3 ]
Full list of resources:
 wti_wa (stonith:fence_wti):    Started nodo3 Clone Set: dlmwa-clone [dlmwa]     Started: [ nodo2 nodo3 nodo4 ]     Stopped: [ centos1.7 centosSC3 ] Clone Set: clvmwa-clone [clvmwa]     Started: [ nodo2 nodo3 nodo4 ]     Stopped: [ centos1.7 centosSC3 ] Clone Set: gfs2wa-clone [gfs2wa]     Started: [ nodo2 nodo3 nodo4 ]     Stopped: [ centos1.7 centosSC3 ] vmcentos2      (ocf::heartbeat:VirtualDomain): Started nodo2
 Clone Set: iscsiwa-clone [iscsiwa]     Started: [ nodo2 nodo3 nodo4 ]     Stopped: [ centos1.7 centosSC3 ]
PCSD Status:  nodo2: Online  nodo3: Online  nodo4: Online
Daemon Status:  corosync: active/enabled  pacemaker: active/enabled  pcsd: active/enabled
Many thanks. Many thanks. 		 	   		   		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150327/953a544d/attachment.htm>


More information about the Users mailing list