[ClusterLabs] crazy setup of fence_sbd, clusterd MD, nvmet

Tue Apr 15 18:43:37 UTC 2025

Thank you. For SBD I use smaller NVMe targets from other nodes

What about clustered MD vs ordinary MD on top of NVMe targets? Does it
make sense or adds to consistency in case of network outage or node
reboot?
Do I need vgcreate with --shared option? Or again, unnecessary
complication here?

DLM finally started, I forgot to install the core package (only
dlm-lib was installed)

On Tue, 15 Apr 2025 at 21:04, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>
> 14.04.2025 19:43, Artem wrote:
> > Dear gurus, I need your advice.
> >
> > We want to build a pacemaker cluster with the following resources.
> > Could you please evaluate the idea and give feedback?
> >
> >
> > Pairs of nodes with NVMe disks. Disks are shared from one node to
> > another via nvmet. Persistent udev names and partition ids.
> > MD raid1 is made on top of pairs of disks from different nodes. I
> > suspect it must be clustered MD, and it'll require dlm?
> > 2 or 4 clustered VLM volume groups are made on top of MD devices.
> > Pacemaker location preference rules for half of VGs to one node and
> > another half to another node.
> >
> > Striped LVs on top of VG with FS for Lustre MDT and OST. 2 main nodes
> > in Corosync, other OST nodes are configured as remote resources.
> >
> > OS network is separate from iBMC, and firewall rules deny this
> > traffic, so I decided to use SBD for fencing.
> >
>
> SBD requires a shared independent device. Using disks local to each
> cluster node for SBD defeats its purpose.
>
> >
> > I only found some pieces of such a stack documented, different OS,
> > different years ago. Now I'm trying to make it work together. At the
> > moment the clustered MD cannot be created as it fails to create a
> > lockspace (due to dlm error?). And dlm-clone doesn't want to start
> > either on main nodes or (as it should) on remote nodes. OS = RHEL9.
> >
> > May be such setup is too complicated? I try to avoid split brain
> > situations and uncoordinated writes by 2 mdadm processes on different
> > nodes in all failure scenarios.
> > I know that a common approach is to use JBODs of SAN arrays. But we
> > don't have it for this project.
> >
> > Thanks in advance.
> > Kindest regards,
> > Artem
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>