[ClusterLabs] Resource is Unbalanced After Powering Off One Node
chenzufei at gmail.com
chenzufei at gmail.com
Fri Jun 6 08:19:39 UTC 2025
Hi all,
I am writing to report an issue with uneven resource migration in our Lustre cluster. Below are the details:
一 Background:
We have 3 physical nodes, each hosting 2 virtual machines: lustre-mds-nodexx (containing 2 MDTs) and lustre-oss-nodexx (containing 8 OSTs and MGS on one of them).
We are using Lustre version 2.15.5 along with Pacemaker(2.1.0) for cluster management.
二 Problem:
After powering off lustre-oss-node144 using the command virsh destroy lustre-oss-node144, the resources from lustre-oss-node144 did not migrate evenly. All resources migrated to lustre-oss-node31.
三 Resource Status Before and After powering off lustre-oss-node144:
Before :
[root at lustre-oss-node31 ~]# pcs status
Cluster name: oss_cluster
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: lustre-oss-node144 (version 2.1.7-5.el8_10-0f7f88312) - partition with quorum
* Last updated: Fri Jun 6 14:10:54 2025 on lustre-oss-node31
* Last change: Fri Jun 6 14:06:46 2025 by root via root on lustre-oss-node31
* 3 nodes configured
* 28 resource instances configured
Node List:
* Online: [ lustre-oss-node31 lustre-oss-node135 lustre-oss-node144 ]
Full List of Resources:
* vmfence_lustre-oss-node31 (stonith:fence_xvm): Started lustre-oss-node144
* vmfence_lustre-oss-node144 (stonith:fence_xvm): Started lustre-oss-node135
* vmfence_lustre-oss-node135 (stonith:fence_xvm): Started lustre-oss-node31
* mgt (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-0 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-3 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-6 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-9 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-12 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-15 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-18 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-21 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-1 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-4 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-7 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-10 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-13 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-16 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-19 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-22 (ocf::heartbeat:Filesystem): Started lustre-oss-node144
* ost-2 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-5 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-8 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-11 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-14 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-17 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-20 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-23 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
2 After
[root at lustre-oss-node31 ~]# date;pcs status
Fri Jun 6 14:12:50 CST 2025
Cluster name: oss_cluster
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: lustre-oss-node135 (version 2.1.7-5.el8_10-0f7f88312) - partition with quorum
* Last updated: Fri Jun 6 14:12:50 2025 on lustre-oss-node31
* Last change: Fri Jun 6 14:06:46 2025 by root via root on lustre-oss-node31
* 3 nodes configured
* 28 resource instances configured
Node List:
* Online: [ lustre-oss-node31 lustre-oss-node135 ]
* OFFLINE: [ lustre-oss-node144 ]
Full List of Resources:
* vmfence_lustre-oss-node31 (stonith:fence_xvm): Started lustre-oss-node135
* vmfence_lustre-oss-node144 (stonith:fence_xvm): Started lustre-oss-node135
* vmfence_lustre-oss-node135 (stonith:fence_xvm): Started lustre-oss-node31
* mgt (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-0 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-3 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-6 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-9 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-12 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-15 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-18 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-21 (ocf::heartbeat:Filesystem): Started lustre-oss-node135
* ost-1 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-4 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-7 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-10 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-13 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-16 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-19 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-22 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-2 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-5 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-8 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-11 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-14 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-17 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-20 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
* ost-23 (ocf::heartbeat:Filesystem): Started lustre-oss-node31
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
三 Logs:
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-1 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-4 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-7 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-10 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-13 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-16 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-19 ( lustre-oss-node144 -> lustre-oss-node31 )
Jun 06 14:11:12 lustre-oss-node135 pacemaker-schedulerd[1069268] (log_list_item) notice: Actions: Move ost-22 ( lustre-oss-node144 -> lustre-oss-node31 )
四 Attachments:
The attached files include the configuration(config.txt) and logs(node135.log) during the uneven migration.
Thank you for your attention and support.
Best regards
chenzufei at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250606/55e9a209/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.txt
Type: application/octet-stream
Size: 21328 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250606/55e9a209/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: node135.log
Type: application/octet-stream
Size: 825308 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250606/55e9a209/attachment-0003.obj>
More information about the Users
mailing list