[ClusterLabs] Antw: Another word of warning regarding VirtualDomain and Live Migration

Wed Dec 16 14:13:02 EST 2020

On Wed, 2020-12-16 at 10:06 +0100, Ulrich Windl wrote:
> Hi!
> 
> (I changed the subject of the thread)
> VirtualDomain seems to be broken, as it does not handle a failed
> live-,igration correctly:
> 
> With my test-VM running on node h16, this happened when I tried to
> move it away (for testing):
> 
> Dec 16 09:28:46 h19 pacemaker-schedulerd[4427]:  notice:  *
> Migrate    prm_xen_test-jeos                    ( h16 -> h19 )
> Dec 16 09:28:46 h19 pacemaker-controld[4428]:  notice: Initiating
> migrate_to operation prm_xen_test-jeos_migrate_to_0 on h16
> Dec 16 09:28:47 h19 pacemaker-controld[4428]:  notice: Transition 840
> aborted by operation prm_xen_test-jeos_migrate_to_0 'modify' on h16:
> Event failed
> Dec 16 09:28:47 h19 pacemaker-controld[4428]:  notice: Transition 840
> action 115 (prm_xen_test-jeos_migrate_to_0 on h16): expected 'ok' but
> got 'error'
> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]:  warning: Unexpected
> result (error: test-jeos: live migration to h19 failed: 1) was
> recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16
> 09:28:46 2020
> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]:  warning: Unexpected
> result (error: test-jeos: live migration to h19 failed: 1) was
> recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16
> 09:28:46 2020
> ### (note the message above is duplicate!)

A bit confusing, but that's because the operation is recorded twice,
once on its own, and once as "last_failure". If the operation later
succeeds, the "on its own" entry will be overwritten by the success,
but the "last_failure" will stick around until the resource is cleaned
up. That's how failures can continue to be shown in status after a
later success.

> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]:  error: Resource
> prm_xen_test-jeos is active on 2 nodes (attempting recovery)
> ### This is nonsense after a failed live migration!

That's just wording; it should probably say "could be active". From
Pacemaker's point of view, since it doesn't have any service-specific
intelligence, a failed migration might have left the resource active on
one node, the other, or both.

> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]:  notice:  *
> Recover    prm_xen_test-jeos                    (             h19 )
> 
> 
> So the cluster is exactly doing the wrong thing: The VM ist still
> active on h16, while a "recovery" on h19 will start it there! So
> _after_ the recovery the VM is duplicate.

The problem here is that a stop should be scheduled on both nodes, not
just one of them. Then the start is scheduled on only one node.

Do you have the pe input from this transition?

> Dec 16 09:28:47 h19 pacemaker-controld[4428]:  notice: Initiating
> stop operation prm_xen_test-jeos_stop_0 locally on h19
> Dec 16 09:28:47 h19 VirtualDomain(prm_xen_test-jeos)[20656]: INFO:
> Domain test-jeos already stopped.
> Dec 16 09:28:47 h19 pacemaker-execd[4425]:  notice: prm_xen_test-jeos 
> stop (call 372, PID 20620) exited with status 0 (execution time
> 283ms, queue time 0ms)
> Dec 16 09:28:47 h19 pacemaker-controld[4428]:  notice: Result of stop
> operation for prm_xen_test-jeos on h19: ok
> Dec 16 09:31:45 h19 pacemaker-controld[4428]:  notice: Initiating
> start operation prm_xen_test-jeos_start_0 locally on h19
> 
> Dec 16 09:31:47 h19 pacemaker-execd[4425]:  notice: prm_xen_test-jeos 
> start (call 373, PID 21005) exited with status 0 (execution time
> 2715ms, queue time 0ms)
> Dec 16 09:31:47 h19 pacemaker-controld[4428]:  notice: Result of
> start operation for prm_xen_test-jeos on h19: ok
> Dec 16 09:33:46 h19 pacemaker-schedulerd[4427]:  warning: Unexpected
> result (error: test-jeos: live migration to h19 failed: 1) was
> recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16
> 09:28:46 2020
> 
> Amazingly manual migration using virsh worked:
> virsh migrate --live test-jeos xen+tls://h18...
> 
> Regards,
> Ulrich Windl
> 
> 
> > > > Ulrich Windl schrieb am 14.12.2020 um 15:21 in Nachricht
> > > > <5FD774CF.8DE : 161 :
> 
> 60728>:
> > Hi!
> > 
> > I think I found the problem why a VM ist started on two nodes:
> > 
> > Live-Migration had failed (e.g. away from h16), so the cluster uses
> > stop and 
> > start (stop on h16, start on h19 for example).
> > When rebooting h16, I see these messages (h19 is DC):
> > 
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  warning:
> > Unexpected result 
> > (error: test-jeos: live migration to h16 failed: 1) was recorded
> > for 
> > migrate_to of prm_xen_test-jeos on h19 at Dec 14 11:54:08 2020
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  error: Resource 
> > prm_xen_test-jeos is active on 2 nodes (attempting recovery)
> > 
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  notice:  *
> > Restart    
> > prm_xen_test-jeos                    (             h16 )
> > 
> > THIS IS WRONG: h16 was booted, so no VM is running on h16 (unless
> > there was 
> > some autostart from libvirt. " virsh list --autostart" does not
> > list any)
> > 
> > Dec 14 15:09:27 h16 VirtualDomain(prm_xen_test-jeos)[4850]: INFO:
> > Domain 
> > test-jeos already stopped.
> > 
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  error: Calculated 
> > transition 669 (with errors), saving inputs in 
> > /var/lib/pacemaker/pengine/pe-error-4.bz2
> > 
> > Whhat's going on here?
> > 
> > Regards,
> > Ulrich
> > 
> > > > > Ulrich Windl schrieb am 14.12.2020 um 08:15 in Nachricht
> > > > > <5FD7110D.D09 : 161 
> > 
> > :
> > 60728>:
> > > Hi!
> > > 
> > > Another word of warning regarding VirtualDomain: While
> > > configuring a 3-node 
> > > cluster with SLES15 SP2 for Xen PVM (using libvirt and the
> > > VirtaulDOmain 
> > 
> > RA), 
> > > I had created a TestVM using BtrFS.
> > > At some time of testing the cluster ended with the testVM running
> > > on more 
> > > than one node (for reasons still to examine). Only after a "crm
> > > resource 
> > > refresh" (rebprobe) the cluster tried to fix the problem.
> > > Well at some point the VM wouldn't start any more, because the
> > > BtrFS used 
> > > for all (SLES default) was corrupted in a way that seems
> > > unrecoverable, 
> > > independenlty of how many subvolumes and snapshots of those may
> > > exist.
> > > 
> > > Initially I would guess the libvirt stack and VirtualDomain is
> > > less 
> > 
> > reliable 
> > > than the old Xen method and RA.
> > > 
> > > Regards,
> > > Ulrich
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot <kgaillot at redhat.com>