[ClusterLabs] [EXT] Re: Rebuild of failed node

Fri Jun 6 06:05:11 UTC 2025

Fabrizio,

Yes, it’s easier to configure all cluster nodes at once. Years ago I added an extra node to an existing cluster, and it didn’t work as smoothly as expected. Cleanly re-adding a node is probably:

  1.  Cleanly remove a node
  2.  Cleanly add a node

😉

Kind regards,
Ulrich Windl

From: Users <users-bounces at clusterlabs.org> On Behalf Of Fabrizio Ermini
Sent: Monday, May 12, 2025 12:56 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Subject: [EXT] Re: [ClusterLabs] Rebuild of failed node

Hi Ulrich, and thanks for your reply. In this case, the missing node has been wiped and formatted. Reason wasn't an actual fault, but rather the need to change storage configuration (we had to comply with a security requirement that involved adding cryptography at rest of cluster volumes). I intended to use this activity also as a test bed to learn what was the correct procedure to follow in case of a node loss, so that I could be already sure on how to proceed in an emergency. These servers are not under backup: they are meant to work as edge servers, collecting data and shipping them up towards main production servers. We have a golden image that allows us to reinstall them quickly, but the installation procedure is based on the assumption that both nodes are installed together: at the moment I haven't a procedure to reinstall just one of the nodes, and that's what I'd like to create.

I hope that this can clarify your doubts,  best regards
Fabrizio

Il giorno lun 12 mag 2025 alle ore 08:41 Windl, Ulrich <u.windl at ukr.de<mailto:u.windl at ukr.de>> ha scritto:
Maybe explain what “failed node” and “rebuild” actually means:
It was fenced, or was it reinstalled, or did you have a fatal disk failure?
Usually a backup is your best friend.

Kind regards,
Ulrich Windl

From: Users <users-bounces at clusterlabs.org<mailto:users-bounces at clusterlabs.org>> On Behalf Of Fabrizio Ermini
Sent: Friday, May 9, 2025 4:26 PM
To: users at clusterlabs.org<mailto:users at clusterlabs.org>
Subject: [EXT] [ClusterLabs] Rebuild of failed node

Hi all! Freshmen here, just joined.

I'm currently in the need to rebuild a failed node on a pacemaker2.1/corosync3.1 2-node cluster with drbd storage.
I've searched in Pacemaker docs and in the list archives, but I haven't found a clear guide on how to proceed in this task. So far, I've reinstalled a new server, configured the same IP and hostname of the failed one, and installed all the software. I've also fixed DRBD layer and started the resync of the volumes. But it's not clear to me how to proceed - I've found some hints online pointing to the need of manually copying corosync config, but they were quite old and probably obsolete. I'm using pcs as a shell and I haven't found a command designed to replace a node, only to add or remove them.
It seems really strange to me that there isn't a guide, since this should be a very basic operation and it's quite important to know how to do it - HW breaks, as a matter of fact :D
So I'll be very grateful if anyone can point me in the right direction.
Thanks in advance, and best regards

Fabrizio

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20250606/efee4241/attachment-0001.htm>