[ClusterLabs] The cluster is having fun ;-)
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue May 4 02:05:18 EDT 2021
Hi!
I'm using a utilization based resource placement. Yesterday I shut down one node of three, and I increased RAM and vcpus of one VM (v15) as part of restarting it (full stop/start).
I have a rule that sets the stickiness to zero for an hour in the evening, allowing the VMs to life-migrate to rebalance load. This is what had happened:
May 03 20:07:46 h16 pacemaker-controld[6919]: notice: State transition S_IDLE -> S_POLICY_ENGINE
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: Watchdog will be used via SBD if fencing is required and stonith-watchdog-timeout is nonzero
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v07 ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v15 ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v16 ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v12 ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v09 ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v14 ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v13 ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_v17 ( h16 -> h19 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_test-jeos1 ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_test-jeos2 ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_test-jeos4 ( h19 -> h16 )
May 03 20:07:46 h16 pacemaker-schedulerd[6918]: notice: * Migrate prm_xen_test-jeos5 ( h16 -> h19 )
Those test-jeos VMs are just tests, all using the very same utilization parameters, so it's amazing that kind of "ring shifts" them.
v07 has a rather small assignment, and v12, v13 and v09 also use the same assignment.
At least the resource placement on the active nodes at the end is rather balanced:
Remaining: h16 capacity: utl_ram=108820 utl_cpu=220
Remaining: h19 capacity: utl_ram=108820 utl_cpu=200
Regards,
Ulrich
More information about the Users
mailing list