[ClusterLabs] Antw: Another word of warning regarding VirtualDomain and Live Migration
Ken Gaillot
kgaillot at redhat.com
Wed Dec 16 14:13:02 EST 2020
On Wed, 2020-12-16 at 10:06 +0100, Ulrich Windl wrote:
> Hi!
>
> (I changed the subject of the thread)
> VirtualDomain seems to be broken, as it does not handle a failed
> live-,igration correctly:
>
> With my test-VM running on node h16, this happened when I tried to
> move it away (for testing):
>
> Dec 16 09:28:46 h19 pacemaker-schedulerd[4427]: notice: *
> Migrate prm_xen_test-jeos ( h16 -> h19 )
> Dec 16 09:28:46 h19 pacemaker-controld[4428]: notice: Initiating
> migrate_to operation prm_xen_test-jeos_migrate_to_0 on h16
> Dec 16 09:28:47 h19 pacemaker-controld[4428]: notice: Transition 840
> aborted by operation prm_xen_test-jeos_migrate_to_0 'modify' on h16:
> Event failed
> Dec 16 09:28:47 h19 pacemaker-controld[4428]: notice: Transition 840
> action 115 (prm_xen_test-jeos_migrate_to_0 on h16): expected 'ok' but
> got 'error'
> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]: warning: Unexpected
> result (error: test-jeos: live migration to h19 failed: 1) was
> recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16
> 09:28:46 2020
> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]: warning: Unexpected
> result (error: test-jeos: live migration to h19 failed: 1) was
> recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16
> 09:28:46 2020
> ### (note the message above is duplicate!)
A bit confusing, but that's because the operation is recorded twice,
once on its own, and once as "last_failure". If the operation later
succeeds, the "on its own" entry will be overwritten by the success,
but the "last_failure" will stick around until the resource is cleaned
up. That's how failures can continue to be shown in status after a
later success.
> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]: error: Resource
> prm_xen_test-jeos is active on 2 nodes (attempting recovery)
> ### This is nonsense after a failed live migration!
That's just wording; it should probably say "could be active". From
Pacemaker's point of view, since it doesn't have any service-specific
intelligence, a failed migration might have left the resource active on
one node, the other, or both.
> Dec 16 09:28:47 h19 pacemaker-schedulerd[4427]: notice: *
> Recover prm_xen_test-jeos ( h19 )
>
>
> So the cluster is exactly doing the wrong thing: The VM ist still
> active on h16, while a "recovery" on h19 will start it there! So
> _after_ the recovery the VM is duplicate.
The problem here is that a stop should be scheduled on both nodes, not
just one of them. Then the start is scheduled on only one node.
Do you have the pe input from this transition?
> Dec 16 09:28:47 h19 pacemaker-controld[4428]: notice: Initiating
> stop operation prm_xen_test-jeos_stop_0 locally on h19
> Dec 16 09:28:47 h19 VirtualDomain(prm_xen_test-jeos)[20656]: INFO:
> Domain test-jeos already stopped.
> Dec 16 09:28:47 h19 pacemaker-execd[4425]: notice: prm_xen_test-jeos
> stop (call 372, PID 20620) exited with status 0 (execution time
> 283ms, queue time 0ms)
> Dec 16 09:28:47 h19 pacemaker-controld[4428]: notice: Result of stop
> operation for prm_xen_test-jeos on h19: ok
> Dec 16 09:31:45 h19 pacemaker-controld[4428]: notice: Initiating
> start operation prm_xen_test-jeos_start_0 locally on h19
>
> Dec 16 09:31:47 h19 pacemaker-execd[4425]: notice: prm_xen_test-jeos
> start (call 373, PID 21005) exited with status 0 (execution time
> 2715ms, queue time 0ms)
> Dec 16 09:31:47 h19 pacemaker-controld[4428]: notice: Result of
> start operation for prm_xen_test-jeos on h19: ok
> Dec 16 09:33:46 h19 pacemaker-schedulerd[4427]: warning: Unexpected
> result (error: test-jeos: live migration to h19 failed: 1) was
> recorded for migrate_to of prm_xen_test-jeos on h16 at Dec 16
> 09:28:46 2020
>
> Amazingly manual migration using virsh worked:
> virsh migrate --live test-jeos xen+tls://h18...
>
> Regards,
> Ulrich Windl
>
>
> > > > Ulrich Windl schrieb am 14.12.2020 um 15:21 in Nachricht
> > > > <5FD774CF.8DE : 161 :
>
> 60728>:
> > Hi!
> >
> > I think I found the problem why a VM ist started on two nodes:
> >
> > Live-Migration had failed (e.g. away from h16), so the cluster uses
> > stop and
> > start (stop on h16, start on h19 for example).
> > When rebooting h16, I see these messages (h19 is DC):
> >
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]: warning:
> > Unexpected result
> > (error: test-jeos: live migration to h16 failed: 1) was recorded
> > for
> > migrate_to of prm_xen_test-jeos on h19 at Dec 14 11:54:08 2020
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]: error: Resource
> > prm_xen_test-jeos is active on 2 nodes (attempting recovery)
> >
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]: notice: *
> > Restart
> > prm_xen_test-jeos ( h16 )
> >
> > THIS IS WRONG: h16 was booted, so no VM is running on h16 (unless
> > there was
> > some autostart from libvirt. " virsh list --autostart" does not
> > list any)
> >
> > Dec 14 15:09:27 h16 VirtualDomain(prm_xen_test-jeos)[4850]: INFO:
> > Domain
> > test-jeos already stopped.
> >
> > Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]: error: Calculated
> > transition 669 (with errors), saving inputs in
> > /var/lib/pacemaker/pengine/pe-error-4.bz2
> >
> > Whhat's going on here?
> >
> > Regards,
> > Ulrich
> >
> > > > > Ulrich Windl schrieb am 14.12.2020 um 08:15 in Nachricht
> > > > > <5FD7110D.D09 : 161
> >
> > :
> > 60728>:
> > > Hi!
> > >
> > > Another word of warning regarding VirtualDomain: While
> > > configuring a 3-node
> > > cluster with SLES15 SP2 for Xen PVM (using libvirt and the
> > > VirtaulDOmain
> >
> > RA),
> > > I had created a TestVM using BtrFS.
> > > At some time of testing the cluster ended with the testVM running
> > > on more
> > > than one node (for reasons still to examine). Only after a "crm
> > > resource
> > > refresh" (rebprobe) the cluster tried to fix the problem.
> > > Well at some point the VM wouldn't start any more, because the
> > > BtrFS used
> > > for all (SLES default) was corrupted in a way that seems
> > > unrecoverable,
> > > independenlty of how many subvolumes and snapshots of those may
> > > exist.
> > >
> > > Initially I would guess the libvirt stack and VirtualDomain is
> > > less
> >
> > reliable
> > > than the old Xen method and RA.
> > >
> > > Regards,
> > > Ulrich
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list