[ClusterLabs] HA problem: No live migration when setting node on standby

Wed Apr 12 07:04:03 EDT 2023

On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov <bubble at hoster-ok.com> wrote:
>
> Hi,
>
> Just add a Master role for drbd resource in the colocation. Default is Started (or Slave).
>

Could you elaborate why it is needed? The problem is not leaving the
resource on the node with a demoted instance - when the node goes into
standby, all resources must be evacuated from it anyway. How
collocating VM with master changes it?

>
> Philip Schiller <p.schiller at plusoptix.de> 12 апреля 2023 г. 11:28:57 написал:
>>
>> ________________________________
>>
>> Hi All,
>>
>> I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in
>> primary/primary mode (necessary for live migration).  My configuration:
>>
>> primitive pri-vm-alarmanlage VirtualDomain \
>>         params config="/etc/libvirt/qemu/alarmanlage.xml" hypervisor="qemu:///system" migration_transport=ssh \
>>         meta allow-migrate=true target-role=Started is-managed=true \
>>         op monitor interval=0 timeout=120 \
>>         op start interval=0 timeout=120 \
>>         op stop interval=0 timeout=1800 \
>>         op migrate_to interval=0 timeout=1800 \
>>         op migrate_from interval=0 timeout=1800 \
>>         utilization cpu=2 hv_memory=4096
>> ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
>>         meta clone-max=2 promoted-max=2 notify=true promoted-node-max=1 clone-node-max=1 interleave=true target-role=Started is-managed=true
>> colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
>> location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
>> order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start
>>
>> So to summerize:
>> - A  resource for Virsh
>> - A Master/Slave DRBD ressources for the VM filesystem .
>> - a "order" directive to start the VM after drbd has been promoted.
>>
>> Node startup is ok, the VM is started after DRBD is promoted.
>> Migration with virsh or over crm <crm resource move pri-vm-alarmanlage s0> works fine.
>>
>> Node standby is problematic. Assuming the Virsh VM runs on node s1 :
>>
>> When puting node s1 in standby when node s0 is active, a live migration
>> is started, BUT in the same second, pacemaker tries to demote DRBD
>> volumes on s1 (while live migration is in progress).
>>
>> All this results in "stopping the vm" on s1 and starting the "vm on s0".
>>
>> I do not understand why pacemaker does demote/stop DRBD volumes before VM is migrated.
>> Do i need additional constraints?
>>
>> Setup is done with
>> - Corosync Cluster Engine, version '3.1.6'
>> - Pacemaker 2.1.2
>> - Ubuntu 22.04.2 LTS
>>
>> Thanks for your help,
>>
>> with kind regards Philip
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/