[ClusterLabs] resources do not migrate although node is going to standby

Mon Jul 24 14:52:45 EDT 2017

Hi,

just to be sure:
i have a VirtualDomain resource (called prim_vm_servers_alive) running on one node (ha-idg-2). From reasons i don't remember i have a location constraint:
location cli-prefer-prim_vm_servers_alive prim_vm_servers_alive role=Started inf: ha-idg-2

Now i try to set this node into standby, because i need it to reboot.
>From what i think now the resource can't migrate to node ha-idg-1 because of this constraint. Right ?

That's what the log says:
Jul 21 18:03:50 ha-idg-2 VirtualDomain(prim_vm_servers_alive)[28565]: ERROR: Server_Monitoring: live migration to qemu+ssh://ha-idg-1/system  failed: 1
Jul 21 18:03:50 ha-idg-2 lrmd[8573]:   notice: operation_finished: prim_vm_servers_alive_migrate_to_0:28565:stderr [ error: Requested operation is not valid: domain 'Server_Monitoring' is already active ]
Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation prim_vm_servers_alive_migrate_to_0: unknown error (node=ha-idg-2, call=114, rc=1, cib-update=572, confirmed=true)
Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: ha-idg-2-prim_vm_servers_alive_migrate_to_0:114 [ error: Requested operation is not valid: domain 'Server_Monitoring' is already active\n ]
Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: abort_transition_graph: Transition aborted by prim_vm_servers_alive_migrate_to_0 'modify' on ha-idg-2: Event failed (magic=0:1;64:417:0:656ecd4a-f8e8-46c9-b4e6-194616237988, cib=0.879.5, sou
rce=match_graph_event:350, 0)
Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1

That is the way i understand "Requested operation is not valid". It's not possible because of the constraint.
I just wanted to be sure. And because the resource can't be migrated but the host is going to standby the resource is stopped. Right ?

Strange is that a second resource also running on node ha-idg-2 called prim_vm_mausdb also didn't migrate to the other node. And that's something i don't understand completely.
The resource didn't have any location constraint.
Both VirtualDomains have a vnc server configured (that i can monitor the boot procedure if i have starting problems). The vnc port for prim_vm_mausdb is 5900 in the configuration file.
The port is set to auto for prim_vm_servers_alive because i forgot to configure it fix. So it must be s.th like 5900+ because both resources were running concurrently on the same node.
But prim_vm_mausdb can't migrate because the port is occupied on the other node ha-idg-1:

Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: prim_vm_mausdb_migrate_to_0:28564:stderr [ error: internal error: early end of file from monitor: possible problem: ]
Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: prim_vm_mausdb_migrate_to_0:28564:stderr [ Failed to start VNC server on `127.0.0.1:0,share=allow-exclusive': Failed to bind socket: Address already in use ]
Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: prim_vm_mausdb_migrate_to_0:28564:stderr [  ]
Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation prim_vm_mausdb_migrate_to_0: unknown error (node=ha-idg-2, call=110, rc=1, cib-update=573, confirmed=true)
Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: ha-idg-2-prim_vm_mausdb_migrate_to_0:110 [ error: internal error: early end of file from monitor: possible problem:\nFailed to start VNC server on `127.0.0.1:0,share=allow
-exclusive': Failed to bind socket: Address already in use\n\n ]
Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error

Do i understand it correctly that the port is occupied on the node it should migrate to (ha-idg-1) ?
But there is no vm running and i don't have a standalone vnc server configured. Why is the port occupied ?

Btw: the network sockets are live migrated too during a live migration of a VirtualDomain resource ?
It should be like that.

Thanks.

Bernd

-- 
Bernd Lentes 

Systemadministration 
institute of developmental genetics 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum München 
bernd.lentes at helmholtz-muenchen.de 
phone: +49 (0)89 3187 1241 
fax: +49 (0)89 3187 2294 

no backup - no mercy

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671