[ClusterLabs] What/how to clean up when bootstrapping new cluster (or: I have a phantom node)

Tue May 24 16:05:38 EDT 2022

Hi,

I'm trying to find out the correct steps to start a corosync/pacemaker
cluster right after installing its packages in Debian or Ubuntu.

I'm not using crmsh or pcs on purpose, I really wanted to get this
basic initial step working without those.

Right after install, the default config has this nodelist:
nodelist {
        # Change/uncomment/add node sections to match cluster configuration

        node {
                # Hostname of the node
                name: node1
                # Cluster membership node identifier
                nodeid: 1
                # Address of first link
                ring0_addr: 127.0.0.1
                # When knet transport is used it's possible to define
up to 8 links
                #ring1_addr: 192.168.1.1
        }
        # ...
}

(full default pristine config: https://pastebin.ubuntu.com/p/htBkCvBWqr/)

This results in a crm_mon output of:

Cluster Summary:
  * Stack: corosync
  * Current DC: node1 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue May 24 19:57:05 2022
  * Last change:  Tue May 24 19:56:59 2022 by hacluster via crmd on node1
  * 1 node configured
  * 0 resource instances configured

Node List:
  * Online: [ node1 ]

Active Resources:
  * No active resources

I also tried with corosync 3.1.6 and pacemaker 2.1.2, btw.

I then proceed to making changes to corosync.conf. I give it a real
hostname, ring IP and node id:
nodelist {
        # Change/uncomment/add node sections to match cluster configuration

        node {
                # Hostname of the node
                name: f4
                # Cluster membership node identifier
                nodeid: 104
                # Address of first link
                ring0_addr: 10.226.63.102
                # When knet transport is used it's possible to define
up to 8 links
                #ring1_addr: 192.168.1.1
        }
        # ...
}

Restart the services:
systemctl restart pacemaker corosync

But now I have this phantom "node1" in the cluster, and the cluster
thinks it has two nodes:

Cluster Summary:
  * Stack: corosync
  * Current DC: f4 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue May 24 19:59:56 2022
  * Last change:  Tue May 24 19:59:22 2022 by hacluster via crmd on f4
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Node node1: UNCLEAN (offline)
  * Online: [ f4 ]

Active Resources:
  * No active resources

What is the cleanup step (or steps) that I'm missing? Or are there so
many details that it's best to leave this to pcs/crmsh?