[ClusterLabs] Antw: [EXT] Cannot add a node with pcs
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Wed Jul 13 02:00:56 EDT 2022
>>> Piotr Szafarczyk <piotr-l at netexpert.pl> schrieb am 12.07.2022 um 12:34 in
Nachricht <38ccc24a-7b01-561c-20f8-ec2273a1894f at netexpert.pl>:
> Hi,
>
> I used to have a working cluster with 3 nodes (and stonith disabled).
THE SLES guide says:
Important: No Support Without STONITH
You must have a node fencing mechanism for your cluster.
The global cluster options stonith-enabled and startup-fencing must be
set to true . When you change them, you lose support.
Maybe that helps.
> After an unexpected restart of one node, the cluster split. The node #2
> started to see the others as unclean. Nodes 1 and 2 were cooperating
> with each other, showing #2 as offline. There were no network connection
> problems.
>
> I removed #2 (operating from #1) with
> pcs cluster node remove n2
>
> I verified that it had removed all configuration from #2, both for
> corosync and for pacemaker. The cluster looks like working correctly
> with two nodes (and no traces of #2).
>
> Now I am trying to add the third node back.
> pcs cluster node add n2
> Disabling SBD service...
> n2: sbd disabled
> Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
> n2: successful distribution of the file 'corosync authkey'
> n2: successful distribution of the file 'pacemaker authkey'
> Sending updated corosync.conf to nodes...
> n3: Succeeded
> n2: Succeeded
> n1: Succeeded
> n3: Corosync configuration reloaded
>
> I am able to start #2 operating from #1
>
> pcs cluster pcsd-status
> n2: Online
> n3: Online
> n1: Online
>
> pcs cluster enable n2
> pcs cluster start n2
>
> I can see that corosync's configuration has been updated, but
> pacemaker's not.
>
> _Checking from #1:_
>
> pcs config
> Cluster Name: n
> Corosync Nodes:
> n1 n3 n2
> Pacemaker Nodes:
> n1 n3
> [...]
>
> pcs status
> * 2 nodes configured
> Node List:
> * Online: [ n1 n3 ]
> [...]
>
> pcs cluster cib scope=nodes
> <nodes>
> <node id="1" uname="n1"/>
> <node id="3" uname="n3"/>
> </nodes>
>
> _#2 is seeing the state differently:_
>
> pcs config
> Cluster Name: n
> Corosync Nodes:
> n1 n3 n2
> Pacemaker Nodes:
> n1 n2 n3
>
> pcs status
> * 3 nodes configured
> Node List:
> * Online: [ n2 ]
> * OFFLINE: [ n1 n3 ]
> Full List of Resources:
> * No resources
> [...]
> (there are resources configured on #1 and #3)
>
> pcs cluster cib scope=nodes
> <nodes>
> <node id="1" uname="n1"/>
> <node id="3" uname="n3"/>
> <node id="2" uname="n2"/>
> </nodes>
>
> Help me diagnose it please. Where should I look for the problem? (I have
> already tried a few things more - I see nothing helpful in log files,
> pcs --debug shows nothing suspicious, tried even editing the CIB manually)
>
> Best regards,
>
> Piotr Szafarczyk
More information about the Users
mailing list