[ClusterLabs] [EXT] crazy setup of fence_sbd, clusterd MD, nvmet

Wed Apr 16 05:15:03 UTC 2025

Hi!

Long time ago I had configured a cluster using clustered MD like this:
crm(live/h16)configure# primitive prm_test_raid ocf:heartbeat:Raid1 params raidconf="/etc/mdadm/mdadm.conf" raiddev=/dev/md0 force_clones=true op start timeout=90s op stop timeout=90s op monitor interval=300 timeout=90s op_params OCF_CHECK_LEVEL=10 meta priority=123
crm(live/h16)configure# clone cln_test_raid prm_test_raid meta interleave=true meta priority=123
crm(live/h16)configure# colocation col_raid_DLM inf: ( cln_test_raid ) cln_DLM
crm(live/h16)configure# order ord_DLM_raid inf: cln_DLM ( cln_test_raid )
crm(live/h16)configure# verify
crm(live/h16)configure# commit

If more resources need DLM, just add them using this pattern.

Kind regards,
Ulrich Windl

> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Artem
> Sent: Monday, April 14, 2025 6:44 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: [EXT] [ClusterLabs] crazy setup of fence_sbd, clusterd MD, nvmet
> 
> Dear gurus, I need your advice.
> 
> We want to build a pacemaker cluster with the following resources.
> Could you please evaluate the idea and give feedback?
> 
> 
> Pairs of nodes with NVMe disks. Disks are shared from one node to
> another via nvmet. Persistent udev names and partition ids.
> MD raid1 is made on top of pairs of disks from different nodes. I
> suspect it must be clustered MD, and it'll require dlm?
> 2 or 4 clustered VLM volume groups are made on top of MD devices.
> Pacemaker location preference rules for half of VGs to one node and
> another half to another node.
> 
> Striped LVs on top of VG with FS for Lustre MDT and OST. 2 main nodes
> in Corosync, other OST nodes are configured as remote resources.
> 
> OS network is separate from iBMC, and firewall rules deny this
> traffic, so I decided to use SBD for fencing.
> 
> 
> I only found some pieces of such a stack documented, different OS,
> different years ago. Now I'm trying to make it work together. At the
> moment the clustered MD cannot be created as it fails to create a
> lockspace (due to dlm error?). And dlm-clone doesn't want to start
> either on main nodes or (as it should) on remote nodes. OS = RHEL9.
> 
> May be such setup is too complicated? I try to avoid split brain
> situations and uncoordinated writes by 2 mdadm processes on different
> nodes in all failure scenarios.
> I know that a common approach is to use JBODs of SAN arrays. But we
> don't have it for this project.
> 
> Thanks in advance.
> Kindest regards,
> Artem
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/