[ClusterLabs] HA problem: No live migration when setting node on standby

Wed Apr 12 07:39:49 EDT 2023

On Wed, 2023-04-12 at 14:04 +0300, Andrei Borzenkov wrote:
> On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov <bubble at hoster-
> ok.com> wrote:
> > 
> > Hi,
> > 
> > Just add a Master role for drbd resource in the colocation. Default
> > is Started (or Slave).
> > 
> 
> Could you elaborate why it is needed? The problem is not leaving the
> resource on the node with a demoted instance - when the node goes
> into
> standby, all resources must be evacuated from it anyway. How
> collocating VM with master changes it?

Just an experience. Having constraints non-consistent with each other
touches many corner cases in the code. Especially in such extreme
circumstances like node going to standby, which usually involves
several transitions.

For me that is just a rule of thumb:
colocate VM:Started with drbd:Master
order drbd:promote then VM:start

> 
> > 
> > Philip Schiller <p.schiller at plusoptix.de> 12 апреля 2023 г.
> > 11:28:57 написал:
> > > 
> > > ________________________________
> > > 
> > > Hi All,
> > > 
> > > I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh
> > > in
> > > primary/primary mode (necessary for live migration).  My
> > > configuration:
> > > 
> > > primitive pri-vm-alarmanlage VirtualDomain \
> > >         params config="/etc/libvirt/qemu/alarmanlage.xml"
> > > hypervisor="qemu:///system" migration_transport=ssh \
> > >         meta allow-migrate=true target-role=Started is-
> > > managed=true \
> > >         op monitor interval=0 timeout=120 \
> > >         op start interval=0 timeout=120 \
> > >         op stop interval=0 timeout=1800 \
> > >         op migrate_to interval=0 timeout=1800 \
> > >         op migrate_from interval=0 timeout=1800 \
> > >         utilization cpu=2 hv_memory=4096
> > > ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
> > >         meta clone-max=2 promoted-max=2 notify=true promoted-
> > > node-max=1 clone-node-max=1 interleave=true target-role=Started
> > > is-managed=true
> > > colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-
> > > storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
> > > location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage
> > > 200: s1
> > > order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory:
> > > mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start
> > > 
> > > So to summerize:
> > > - A  resource for Virsh
> > > - A Master/Slave DRBD ressources for the VM filesystem .
> > > - a "order" directive to start the VM after drbd has been
> > > promoted.
> > > 
> > > Node startup is ok, the VM is started after DRBD is promoted.
> > > Migration with virsh or over crm <crm resource move pri-vm-
> > > alarmanlage s0> works fine.
> > > 
> > > Node standby is problematic. Assuming the Virsh VM runs on node
> > > s1 :
> > > 
> > > When puting node s1 in standby when node s0 is active, a live
> > > migration
> > > is started, BUT in the same second, pacemaker tries to demote
> > > DRBD
> > > volumes on s1 (while live migration is in progress).
> > > 
> > > All this results in "stopping the vm" on s1 and starting the "vm
> > > on s0".
> > > 
> > > I do not understand why pacemaker does demote/stop DRBD volumes
> > > before VM is migrated.
> > > Do i need additional constraints?
> > > 
> > > Setup is done with
> > > - Corosync Cluster Engine, version '3.1.6'
> > > - Pacemaker 2.1.2
> > > - Ubuntu 22.04.2 LTS
> > > 
> > > Thanks for your help,
> > > 
> > > with kind regards Philip
> > > 
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> > > 
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/