[ClusterLabs] Rebuild of failed node
Tomas Jelinek
tojeline at redhat.com
Tue May 13 10:07:54 UTC 2025
Hi Fabrizio,
You are right, there is no pcs command for replacing a node. Instead,
you could run pcs commands on the surviving node to remove and then add
the other node. You need to do pcs auth first. This would sync most of
the files (authkeys, corosync.conf, qdevice certificates) for you. Pcs
doesn't handle drbd configuration.
Regards,
Tomas
Dne 13. 05. 25 v 9:19 Fabrizio Ermini napsal(a):
> Thank you very much Alexey, I will certainly try that and update you on
> the result.
>
> Best regards!
>
>
> Il giorno lun 12 mag 2025 alle ore 22:36 <alexey at pavlyuts.ru
> <mailto:alexey at pavlyuts.ru>> ha scritto:
>
> Hi,____
>
> __ __
>
> Occasionally, I have pacemaker as a base layer of custom clustering
> solution and I have a script to rebuild the second node from the
> first one. I can’t share the script itself as is has a lot of
> solution-dependent references, but I can share the sequence to
> rebuild the failed node:____
>
> 1. Setup the new node with the same IP and hostname____
> 2. (optional) setup passwordless mutual key-based SSH access. It is
> not necessary, but make a lot of things easy.____
> 3. Copy files from survived host to the new one:____
> 1. /etc/corosync/authkey____
> 2. /etc/corosync/corosync.conf____
> 3. /etc/drbd.d/*.res____
> 4. /etc/pacemaker/authkey____
> 4. Set *hacluster* user pass to the same as it was on the survived
> node.____
> 5. Re-auth pcs nodes with command
> pcs host auth <host1_name> <host2_name> -u hacluster -p
> <ha_cluster_pass>____
> 6. Reboot the restored server____
> 7. PROFIT!!!____
>
> __ __
>
> If you use no arbiter (corosync-qnetd) this should be enough for
> your new cluster node up and running. If you use corosync-qnetd, you
> need also restore corosync-qdevice nssdb keys for the second host
> connect the arbiter node:____
>
> 1. On old host, extract your arbiter certificate from nssdb on the
> survived host:
> certutil -L -d /etc/corosync/qdevice/net/nssdb -n 'QNet CA' -r
> > /root/qnetd-cert.crt____
> 2. Copy certificate to the new host, assume the path on the new
> host is the same____
> 3. On the new host, Init new nssdb with certificate:
> corosync-qdevice-net-certutil -i -c /root/qnetd-cert.crt____
> 4. Copy certificate and key at location /etc/corosync/qdevice/net/
> nssdb/qdevice-net-node.p12from old node to new one____
> 5. On the new node: Import certificate and key:
> corosync-qdevice-net-certutil -m -c /etc/corosync/qdevice/net/
> nssdb/qdevice-net-node.p12____
> 6. Enable or restart corosync-qdevice:
> systemctl enable –now corosync-qdevice.service
> or
> systemctl restart corosync-qdevice.service____
> 7. Enjoy!____
>
> __ __
>
> That’s what practically work for me and included in service scripts
> of our product, based on Pacemaker.____
>
> __ __
>
> Hope this could help!____
>
> __ __
>
> Sincerely,____
>
> __ __
>
> Alex____
>
> __ __
>
> __ __
>
> *From:*Users <users-bounces at clusterlabs.org <mailto:users-
> bounces at clusterlabs.org>> *On Behalf Of *Fabrizio Ermini
> *Sent:* Friday, May 9, 2025 5:26 PM
> *To:* users at clusterlabs.org <mailto:users at clusterlabs.org>
> *Subject:* [ClusterLabs] Rebuild of failed node____
>
> __ __
>
> Hi all! Freshmen here, just joined. ____
>
> __ __
>
> I'm currently in the need to rebuild a failed node on a
> pacemaker2.1/corosync3.1 2-node cluster with drbd storage. ____
>
> I've searched in Pacemaker docs and in the list archives, but I
> haven't found a clear guide on how to proceed in this task. So far,
> I've reinstalled a new server, configured the same IP and hostname
> of the failed one, and installed all the software. I've also fixed
> DRBD layer and started the resync of the volumes. But it's not clear
> to me how to proceed - I've found some hints online pointing to the
> need of manually copying corosync config, but they were quite old
> and probably obsolete. I'm using pcs as a shell and I haven't found
> a command designed to replace a node, only to add or remove them. ____
>
> It seems really strange to me that there isn't a guide, since this
> should be a very basic operation and it's quite important to know
> how to do it - HW breaks, as a matter of fact :D____
>
> So I'll be very grateful if anyone can point me in the right
> direction.____
>
> Thanks in advance, and best regards____
>
> __ __
>
> Fabrizio____
>
> __ __
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users <https://
> lists.clusterlabs.org/mailman/listinfo/users>
>
> ClusterLabs home: https://www.clusterlabs.org/ <https://
> www.clusterlabs.org/>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list