[ClusterLabs] [EXT] Resource is Unbalanced After Powering Off One Node

Windl, Ulrich u.windl at ukr.de
Wed Jun 11 06:12:52 UTC 2025


Hi!

Generally I use "crm_mon -1Arfj” to see the cluster status, and I suspect it my be location restrictions or stickiness preventing resource balancing. Without config it’s hard to guess, however.

Kind regards,
Ulrich Windl

From: Users <users-bounces at clusterlabs.org> On Behalf Of chenzufei at gmail.com
Sent: Friday, June 6, 2025 10:20 AM
To: users <users at clusterlabs.org>
Subject: [EXT] [ClusterLabs] Resource is Unbalanced After Powering Off One Node



Hi all,
I am writing to report an issue with uneven resource migration in our Lustre cluster. Below are the details:

一 Background:
We have 3 physical nodes, each hosting 2 virtual machines: lustre-mds-nodexx (containing 2 MDTs) and lustre-oss-nodexx (containing 8 OSTs and MGS on one of them).
We are using Lustre version 2.15.5 along with Pacemaker(2.1.0) for cluster management.

二 Problem:
After powering off lustre-oss-node144 using the command virsh destroy lustre-oss-node144, the resources from lustre-oss-node144 did not migrate evenly. All resources migrated to lustre-oss-node31.

三 Resource Status Before and After powering off lustre-oss-node144:
Before :
[root at lustre-oss-node31 ~]# pcs status
Cluster name: oss_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: lustre-oss-node144 (version 2.1.7-5.el8_10-0f7f88312) - partition with quorum
  * Last updated: Fri Jun  6 14:10:54 2025 on lustre-oss-node31
  * Last change:  Fri Jun  6 14:06:46 2025 by root via root on lustre-oss-node31
  * 3 nodes configured
  * 28 resource instances configured

Node List:
  * Online: [ lustre-oss-node31 lustre-oss-node135 lustre-oss-node144 ]

Full List of Resources:
  * vmfence_lustre-oss-node31   (stonith:fence_xvm):     Started lustre-oss-node144
  * vmfence_lustre-oss-node144  (stonith:fence_xvm):     Started lustre-oss-node135
  * vmfence_lustre-oss-node135  (stonith:fence_xvm):     Started lustre-oss-node31
  * mgt (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-0       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-3       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-6       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-9       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-12      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-15      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-18      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-21      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-1       (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-4       (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-7       (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-10      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-13      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-16      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-19      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-22      (ocf::heartbeat:Filesystem):     Started lustre-oss-node144
  * ost-2       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-5       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-8       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-11      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-14      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-17      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-20      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-23      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31

2 After
[root at lustre-oss-node31 ~]# date;pcs status
Fri Jun  6 14:12:50 CST 2025
Cluster name: oss_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: lustre-oss-node135 (version 2.1.7-5.el8_10-0f7f88312) - partition with quorum
  * Last updated: Fri Jun  6 14:12:50 2025 on lustre-oss-node31
  * Last change:  Fri Jun  6 14:06:46 2025 by root via root on lustre-oss-node31
  * 3 nodes configured
  * 28 resource instances configured

Node List:
  * Online: [ lustre-oss-node31 lustre-oss-node135 ]
  * OFFLINE: [ lustre-oss-node144 ]

Full List of Resources:
  * vmfence_lustre-oss-node31   (stonith:fence_xvm):     Started lustre-oss-node135
  * vmfence_lustre-oss-node144  (stonith:fence_xvm):     Started lustre-oss-node135
  * vmfence_lustre-oss-node135  (stonith:fence_xvm):     Started lustre-oss-node31
  * mgt (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-0       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-3       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-6       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-9       (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-12      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-15      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-18      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-21      (ocf::heartbeat:Filesystem):     Started lustre-oss-node135
  * ost-1       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-4       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-7       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-10      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-13      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-16      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-19      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-22      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-2       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-5       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-8       (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-11      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-14      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-17      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-20      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31
  * ost-23      (ocf::heartbeat:Filesystem):     Started lustre-oss-node31

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

三 Logs:
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-1                         (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-4                         (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-7                         (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-10                        (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-13                        (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-16                        (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-19                        (  lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item)        notice: Actions: Move       ost-22                        (  lustre-oss-node144 -> lustre-oss-node31 )

四 Attachments:
The attached files include the configuration(config.txt) and logs(node135.log) during the uneven migration.

Thank you for your attention and support.
Best regards


________________________________
chenzufei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250611/39b0bdd5/attachment-0001.htm>


More information about the Users mailing list