[ClusterLabs] crazy setup of fence_sbd, clusterd MD, nvmet

Wed Apr 16 08:20:08 UTC 2025

On Tue, Apr 15, 2025 at 8:04 PM Andrei Borzenkov <arvidjaar at gmail.com>
wrote:

> 14.04.2025 19:43, Artem wrote:
> > Dear gurus, I need your advice.
> >
> > We want to build a pacemaker cluster with the following resources.
> > Could you please evaluate the idea and give feedback?
> >
> >
> > Pairs of nodes with NVMe disks. Disks are shared from one node to
> > another via nvmet. Persistent udev names and partition ids.
> > MD raid1 is made on top of pairs of disks from different nodes. I
> > suspect it must be clustered MD, and it'll require dlm?
> > 2 or 4 clustered VLM volume groups are made on top of MD devices.
> > Pacemaker location preference rules for half of VGs to one node and
> > another half to another node.
> >
> > Striped LVs on top of VG with FS for Lustre MDT and OST. 2 main nodes
> > in Corosync, other OST nodes are configured as remote resources.
> >
> > OS network is separate from iBMC, and firewall rules deny this
> > traffic, so I decided to use SBD for fencing.
> >
>
> SBD requires a shared independent device. Using disks local to each
> cluster node for SBD defeats its purpose.
>

Agreed! Maybe just one more thing to add before it comes up as
a possible solution:

You might think of sharing the disks via some mechanism to the
respectively other side and use sbd poison-pill with 2 disks.
This would probably prevent split-brain but imagine the other node
has some issue and you want to fence it. You would probably not be
able to access the disk shared by it and for successful fencing in
a 2 disk scenario you need to be able to write the poison-pill to both.
Such a setup might on the other hand make sense in a 3-node cluster -
at least under certain circumstances.

Regards,
Klaus

>
> >
> > I only found some pieces of such a stack documented, different OS,
> > different years ago. Now I'm trying to make it work together. At the
> > moment the clustered MD cannot be created as it fails to create a
> > lockspace (due to dlm error?). And dlm-clone doesn't want to start
> > either on main nodes or (as it should) on remote nodes. OS = RHEL9.
> >
> > May be such setup is too complicated? I try to avoid split brain
> > situations and uncoordinated writes by 2 mdadm processes on different
> > nodes in all failure scenarios.
> > I know that a common approach is to use JBODs of SAN arrays. But we
> > don't have it for this project.
> >
> > Thanks in advance.
> > Kindest regards,
> > Artem
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250416/f547664e/attachment-0001.htm>