[ClusterLabs] Antw: Another word of warning regarding VirtualDomain and BtrFS (SLES15 SP2)

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Dec 14 09:21:03 EST 2020


Hi!

I think I found the problem why a VM ist started on two nodes:

Live-Migration had failed (e.g. away from h16), so the cluster uses stop and start (stop on h16, start on h19 for example).
When rebooting h16, I see these messages (h19 is DC):

Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  warning: Unexpected result (error: test-jeos: live migration to h16 failed: 1) was recorded for migrate_to of prm_xen_test-jeos on h19 at Dec 14 11:54:08 2020
Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  error: Resource prm_xen_test-jeos is active on 2 nodes (attempting recovery)

Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  notice:  * Restart    prm_xen_test-jeos                    (             h16 )

THIS IS WRONG: h16 was booted, so no VM is running on h16 (unless there was some autostart from libvirt. " virsh list --autostart" does not list any)

Dec 14 15:09:27 h16 VirtualDomain(prm_xen_test-jeos)[4850]: INFO: Domain test-jeos already stopped.

Dec 14 15:09:27 h19 pacemaker-schedulerd[4427]:  error: Calculated transition 669 (with errors), saving inputs in /var/lib/pacemaker/pengine/pe-error-4.bz2

Whhat's going on here?

Regards,
Ulrich

>>> Ulrich Windl schrieb am 14.12.2020 um 08:15 in Nachricht <5FD7110D.D09 : 161 :
60728>:
> Hi!
> 
> Another word of warning regarding VirtualDomain: While configuring a 3-node 
> cluster with SLES15 SP2 for Xen PVM (using libvirt and the VirtaulDOmain RA), 
> I had created a TestVM using BtrFS.
> At some time of testing the cluster ended with the testVM running on more 
> than one node (for reasons still to examine). Only after a "crm resource 
> refresh" (rebprobe) the cluster tried to fix the problem.
> Well at some point the VM wouldn't start any more, because the BtrFS used 
> for all (SLES default) was corrupted in a way that seems unrecoverable, 
> independenlty of how many subvolumes and snapshots of those may exist.
> 
> Initially I would guess the libvirt stack and VirtualDomain is less reliable 
> than the old Xen method and RA.
> 
> Regards,
> Ulrich
> 
> 
> 






More information about the Users mailing list