[ClusterLabs] What/how to clean up when bootstrapping new cluster (or: I have a phantom node)
Andreas Hasenack
andreas at canonical.com
Tue May 24 16:05:38 EDT 2022
Hi,
I'm trying to find out the correct steps to start a corosync/pacemaker
cluster right after installing its packages in Debian or Ubuntu.
I'm not using crmsh or pcs on purpose, I really wanted to get this
basic initial step working without those.
Right after install, the default config has this nodelist:
nodelist {
# Change/uncomment/add node sections to match cluster configuration
node {
# Hostname of the node
name: node1
# Cluster membership node identifier
nodeid: 1
# Address of first link
ring0_addr: 127.0.0.1
# When knet transport is used it's possible to define
up to 8 links
#ring1_addr: 192.168.1.1
}
# ...
}
(full default pristine config: https://pastebin.ubuntu.com/p/htBkCvBWqr/)
This results in a crm_mon output of:
Cluster Summary:
* Stack: corosync
* Current DC: node1 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Tue May 24 19:57:05 2022
* Last change: Tue May 24 19:56:59 2022 by hacluster via crmd on node1
* 1 node configured
* 0 resource instances configured
Node List:
* Online: [ node1 ]
Active Resources:
* No active resources
I also tried with corosync 3.1.6 and pacemaker 2.1.2, btw.
I then proceed to making changes to corosync.conf. I give it a real
hostname, ring IP and node id:
nodelist {
# Change/uncomment/add node sections to match cluster configuration
node {
# Hostname of the node
name: f4
# Cluster membership node identifier
nodeid: 104
# Address of first link
ring0_addr: 10.226.63.102
# When knet transport is used it's possible to define
up to 8 links
#ring1_addr: 192.168.1.1
}
# ...
}
Restart the services:
systemctl restart pacemaker corosync
But now I have this phantom "node1" in the cluster, and the cluster
thinks it has two nodes:
Cluster Summary:
* Stack: corosync
* Current DC: f4 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Tue May 24 19:59:56 2022
* Last change: Tue May 24 19:59:22 2022 by hacluster via crmd on f4
* 2 nodes configured
* 0 resource instances configured
Node List:
* Node node1: UNCLEAN (offline)
* Online: [ f4 ]
Active Resources:
* No active resources
What is the cleanup step (or steps) that I'm missing? Or are there so
many details that it's best to leave this to pcs/crmsh?
More information about the Users
mailing list