[ClusterLabs] The cluster is having fun ;-)

Tue May 4 02:05:18 EDT 2021

Hi!

I'm using a utilization based resource placement. Yesterday I shut down one node of three, and I increased RAM and vcpus of one VM (v15) as part of restarting it (full stop/start).

I have a rule that sets the stickiness to zero for an hour in the evening, allowing the VMs to life-migrate to rebalance load. This is what had happened:
May 03 20:07:46 h16 pacemaker-controld[6919]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice: Watchdog will be used via SBD if fencing is required and stonith-watchdog-timeout is nonzero
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v07           ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v15           ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v16           ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v12           ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v09           ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v14           ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v13           ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_v17           ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_test-jeos1         ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_test-jeos2         ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_test-jeos4         ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]:  notice:  * Migrate    prm_xen_test-jeos5         ( h16 -> h19 )

Those test-jeos VMs are just tests, all using the very same utilization parameters, so it's amazing that kind of "ring shifts" them.
v07 has a rather small assignment, and v12, v13 and v09 also use the same assignment.

At least the resource placement on the active nodes at the end is rather balanced:
Remaining: h16 capacity: utl_ram=108820 utl_cpu=220
Remaining: h19 capacity: utl_ram=108820 utl_cpu=200

Regards,
Ulrich