<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Tue, Apr 15, 2025 at 8:04 PM Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com">arvidjaar@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">14.04.2025 19:43, Artem wrote:<br>

> Dear gurus, I need your advice.<br>

> <br>

> We want to build a pacemaker cluster with the following resources.<br>

> Could you please evaluate the idea and give feedback?<br>

> <br>

> <br>

> Pairs of nodes with NVMe disks. Disks are shared from one node to<br>

> another via nvmet. Persistent udev names and partition ids.<br>

> MD raid1 is made on top of pairs of disks from different nodes. I<br>

> suspect it must be clustered MD, and it'll require dlm?<br>

> 2 or 4 clustered VLM volume groups are made on top of MD devices.<br>

> Pacemaker location preference rules for half of VGs to one node and<br>

> another half to another node.<br>

> <br>

> Striped LVs on top of VG with FS for Lustre MDT and OST. 2 main nodes<br>

> in Corosync, other OST nodes are configured as remote resources.<br>

> <br>

> OS network is separate from iBMC, and firewall rules deny this<br>

> traffic, so I decided to use SBD for fencing.<br>

> <br>

<br>

SBD requires a shared independent device. Using disks local to each <br>

cluster node for SBD defeats its purpose.<br></blockquote><div><br></div><div>Agreed! Maybe just one more thing to add before it comes up as</div><div>a possible solution:</div><div><br></div><div>You might think of sharing the disks via some mechanism to the</div><div>respectively other side and use sbd poison-pill with 2 disks.</div><div>This would probably prevent split-brain but imagine the other node</div><div>has some issue and you want to fence it. You would probably not be</div><div>able to access the disk shared by it and for successful fencing in</div><div>a 2 disk scenario you need to be able to write the poison-pill to both.</div><div>Such a setup might on the other hand make sense in a 3-node cluster -</div><div>at least under certain circumstances.</div><div><br></div><div>Regards,</div><div>Klaus </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> <br>

> I only found some pieces of such a stack documented, different OS,<br>

> different years ago. Now I'm trying to make it work together. At the<br>

> moment the clustered MD cannot be created as it fails to create a<br>

> lockspace (due to dlm error?). And dlm-clone doesn't want to start<br>

> either on main nodes or (as it should) on remote nodes. OS = RHEL9.<br>

> <br>

> May be such setup is too complicated? I try to avoid split brain<br>

> situations and uncoordinated writes by 2 mdadm processes on different<br>

> nodes in all failure scenarios.<br>

> I know that a common approach is to use JBODs of SAN arrays. But we<br>

> don't have it for this project.<br>

> <br>

> Thanks in advance.<br>

> Kindest regards,<br>

> Artem<br>

> _______________________________________________<br>

> Manage your subscription:<br>

> <a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

> <br>

> ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

<br>

_______________________________________________<br>

Manage your subscription:<br>

<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>

<br>

ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>

<br>

</blockquote></div></div>