[ClusterLabs] [EXT] Resource is Unbalanced After Powering Off One Node
Windl, Ulrich
u.windl at ukr.de
Wed Jun 11 06:12:52 UTC 2025
Hi!
Generally I use "crm_mon -1Arfj” to see the cluster status, and I suspect it my be location restrictions or stickiness preventing resource balancing. Without config it’s hard to guess, however.
Kind regards,
Ulrich Windl
From: Users <users-bounces at clusterlabs.org> On Behalf Of chenzufei at gmail.com
Sent: Friday, June 6, 2025 10:20 AM
To: users <users at clusterlabs.org>
Subject: [EXT] [ClusterLabs] Resource is Unbalanced After Powering Off One Node
Hi all,
I am writing to report an issue with uneven resource migration in our Lustre cluster. Below are the details:
一 Background:
We have 3 physical nodes, each hosting 2 virtual machines: lustre-mds-nodexx (containing 2 MDTs) and lustre-oss-nodexx (containing 8 OSTs and MGS on one of them).
We are using Lustre version 2.15.5 along with Pacemaker(2.1.0) for cluster management.
二 Problem:
After powering off lustre-oss-node144 using the command virsh destroy lustre-oss-node144, the resources from lustre-oss-node144 did not migrate evenly. All resources migrated to lustre-oss-node31.
三 Resource Status Before and After powering off lustre-oss-node144:
Before :
[root at lustre-oss-node31 ~]# pcs status
Cluster name: oss_cluster
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: lustre-oss-node144 (version 2.1.7-5.el8_10-0f7f88312) - partition with quorum
* Last updated: Fri Jun 6 14:10:54 2025 on lustre-oss-node31
* Last change: Fri Jun 6 14:06:46 2025 by root via root on lustre-oss-node31
* 3 nodes configured
* 28 resource instances configured
Node List:
* Online: [ lustre-oss-node31 lustre-oss-node135 lustre-oss-node144 ]
Full List of Resources:
* vmfence_lustre-oss-node31 (stonith:fence_xvm): Started lustre-oss-node144
* vmfence_lustre-oss-node144 (stonith:fence_xvm): Started lustre-oss-node135
* vmfence_lustre-oss-node135 (stonith:fence_xvm): Started lustre-oss-node31
* mgt (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-0 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-3 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-6 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-9 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-12 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-15 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-18 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-21 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-1 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-4 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-7 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-10 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-13 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-16 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-19 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-22 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-2 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-5 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-8 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-11 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-14 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-17 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-20 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-23 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
2 After
[root at lustre-oss-node31 ~]# date;pcs status
Fri Jun 6 14:12:50 CST 2025
Cluster name: oss_cluster
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: lustre-oss-node135 (version 2.1.7-5.el8_10-0f7f88312) - partition with quorum
* Last updated: Fri Jun 6 14:12:50 2025 on lustre-oss-node31
* Last change: Fri Jun 6 14:06:46 2025 by root via root on lustre-oss-node31
* 3 nodes configured
* 28 resource instances configured
Node List:
* Online: [ lustre-oss-node31 lustre-oss-node135 ]
* OFFLINE: [ lustre-oss-node144 ]
Full List of Resources:
* vmfence_lustre-oss-node31 (stonith:fence_xvm): Started lustre-oss-node135
* vmfence_lustre-oss-node144 (stonith:fence_xvm): Started lustre-oss-node135
* vmfence_lustre-oss-node135 (stonith:fence_xvm): Started lustre-oss-node31
* mgt (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-0 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-3 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-6 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-9 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-12 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-15 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-18 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-21 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-1 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-4 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-7 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-10 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-13 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-16 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-19 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-22 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-2 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-5 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-8 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-11 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-14 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-17 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-20 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-23 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
三 Logs:
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-1 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-4 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-7 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-10 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-13 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-16 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-19 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-22 ( lustre-oss-node144 -> lustre-oss-node31 )
四 Attachments:
The attached files include the configuration(config.txt) and logs(node135.log) during the uneven migration.
Thank you for your attention and support.
Best regards
________________________________
chenzufei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250611/39b0bdd5/attachment-0001.htm>
More information about the Users
mailing list