[ClusterLabs] HA problem: No live migration when setting node on standby

Philip Schiller p.schiller at plusoptix.de
Wed Apr 12 05:28:48 EDT 2023


Hi All,

I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in
primary/primary mode (necessary for live migration).  My configuration:

primitive pri-vm-alarmanlage VirtualDomain \
         params config="/etc/libvirt/qemu/alarmanlage.xml" hypervisor="qemu:///system" migration_transport=ssh \
         meta allow-migrate=true target-role=Started is-managed=true \
         op monitor interval=0 timeout=120 \
         op start interval=0 timeout=120 \
         op stop interval=0 timeout=1800 \
         op migrate_to interval=0 timeout=1800 \
         op migrate_from interval=0 timeout=1800 \
         utilization cpu=2 hv_memory=4096
ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
         meta clone-max=2 promoted-max=2 notify=true promoted-node-max=1 clone-node-max=1 interleave=true target-role=Started is-managed=true
colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start

So to summerize:
- A  resource for Virsh
- A Master/Slave DRBD ressources for the VM filesystem .
- a "order" directive to start the VM after drbd has been promoted.

Node startup is ok, the VM is started after DRBD is promoted.
Migration with virsh or over crm <crm resource move pri-vm-alarmanlage s0> works fine.

Node standby is problematic. Assuming the Virsh VM runs on node s1 :

When puting node s1 in standby when node s0 is active, a live migration
is started, BUT in the same second, pacemaker tries to demote DRBD
volumes on s1 (while live migration is in progress).

All this results in "stopping the vm" on s1 and starting the "vm on s0".

I do not understand why pacemaker does demote/stop DRBD volumes before VM is migrated.
Do i need additional constraints?

Setup is done with
- Corosync Cluster Engine, version '3.1.6'
- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS

Thanks for your help,

with kind regards Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230412/587966c5/attachment.htm>

More information about the Users mailing list