[ClusterLabs] What/how to clean up when bootstrapping new cluster (or: I have a phantom node)
Ken Gaillot
kgaillot at redhat.com
Tue May 24 17:34:10 EDT 2022
On Tue, 2022-05-24 at 20:05 +0000, Andreas Hasenack wrote:
> Hi,
>
> I'm trying to find out the correct steps to start a
> corosync/pacemaker
> cluster right after installing its packages in Debian or Ubuntu.
>
> I'm not using crmsh or pcs on purpose, I really wanted to get this
> basic initial step working without those.
>
> Right after install, the default config has this nodelist:
> nodelist {
> # Change/uncomment/add node sections to match cluster
> configuration
>
> node {
> # Hostname of the node
> name: node1
> # Cluster membership node identifier
> nodeid: 1
> # Address of first link
> ring0_addr: 127.0.0.1
> # When knet transport is used it's possible to define
> up to 8 links
> #ring1_addr: 192.168.1.1
> }
> # ...
> }
>
>
> (full default pristine config:
> https://pastebin.ubuntu.com/p/htBkCvBWqr/)
>
> This results in a crm_mon output of:
>
> Cluster Summary:
> * Stack: corosync
> * Current DC: node1 (version 2.0.3-4b1f869f0f) - partition with
> quorum
> * Last updated: Tue May 24 19:57:05 2022
> * Last change: Tue May 24 19:56:59 2022 by hacluster via crmd on
> node1
> * 1 node configured
> * 0 resource instances configured
>
> Node List:
> * Online: [ node1 ]
>
> Active Resources:
> * No active resources
>
> I also tried with corosync 3.1.6 and pacemaker 2.1.2, btw.
>
> I then proceed to making changes to corosync.conf. I give it a real
> hostname, ring IP and node id:
> nodelist {
> # Change/uncomment/add node sections to match cluster
> configuration
>
> node {
> # Hostname of the node
> name: f4
> # Cluster membership node identifier
> nodeid: 104
> # Address of first link
> ring0_addr: 10.226.63.102
> # When knet transport is used it's possible to define
> up to 8 links
> #ring1_addr: 192.168.1.1
> }
> # ...
> }
>
>
> Restart the services:
> systemctl restart pacemaker corosync
>
> But now I have this phantom "node1" in the cluster, and the cluster
> thinks it has two nodes:
>
> Cluster Summary:
> * Stack: corosync
> * Current DC: f4 (version 2.0.3-4b1f869f0f) - partition with quorum
> * Last updated: Tue May 24 19:59:56 2022
> * Last change: Tue May 24 19:59:22 2022 by hacluster via crmd on
> f4
> * 2 nodes configured
> * 0 resource instances configured
>
> Node List:
> * Node node1: UNCLEAN (offline)
> * Online: [ f4 ]
>
> Active Resources:
> * No active resources
>
>
> What is the cleanup step (or steps) that I'm missing? Or are there so
> many details that it's best to leave this to pcs/crmsh?
crm_node --remove node1
or just don't start pacemaker until corosync is correct. pcs/crmsh are
definitely much easier to use (especially as the number of nodes grows)
but if you're looking to learn low-level details, there's nothing wrong
with that.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list