[ClusterLabs] Cannot add a node with pcs
Tomas Jelinek
tojeline at redhat.com
Tue Jul 12 06:50:38 EDT 2022
Hi Piotr,
Based on 'pcs cluster node add n2' and 'pcs config' outputs, pcs added
the node to your cluster successfully, that is corosync config has been
modified, distributed and loaded.
It looks like the problem is with pacemaker. This is a wild guess, but
maybe pacemaker wants to fence n2, which is not possible, as you
disabled stonith. In the meantime, n1 and n3 do not allow n2 to join,
until it's confirmed fenced. Try looking into / posting 'pcs status
--full' and pacemaker log.
With stonith disabled, you have a working cluster (seemingly). Until you
don't, due to an event which requires working stonith for the cluster to
recover.
Regards,
Tomas
Dne 12. 07. 22 v 12:34 Piotr Szafarczyk napsal(a):
> Hi,
>
> I used to have a working cluster with 3 nodes (and stonith disabled).
> After an unexpected restart of one node, the cluster split. The node #2
> started to see the others as unclean. Nodes 1 and 2 were cooperating
> with each other, showing #2 as offline. There were no network connection
> problems.
>
> I removed #2 (operating from #1) with
> pcs cluster node remove n2
>
> I verified that it had removed all configuration from #2, both for
> corosync and for pacemaker. The cluster looks like working correctly
> with two nodes (and no traces of #2).
>
> Now I am trying to add the third node back.
> pcs cluster node add n2
> Disabling SBD service...
> n2: sbd disabled
> Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
> n2: successful distribution of the file 'corosync authkey'
> n2: successful distribution of the file 'pacemaker authkey'
> Sending updated corosync.conf to nodes...
> n3: Succeeded
> n2: Succeeded
> n1: Succeeded
> n3: Corosync configuration reloaded
>
> I am able to start #2 operating from #1
>
> pcs cluster pcsd-status
> n2: Online
> n3: Online
> n1: Online
>
> pcs cluster enable n2
> pcs cluster start n2
>
> I can see that corosync's configuration has been updated, but
> pacemaker's not.
>
> _Checking from #1:_
>
> pcs config
> Cluster Name: n
> Corosync Nodes:
> n1 n3 n2
> Pacemaker Nodes:
> n1 n3
> [...]
>
> pcs status
> * 2 nodes configured
> Node List:
> * Online: [ n1 n3 ]
> [...]
>
> pcs cluster cib scope=nodes
> <nodes>
> <node id="1" uname="n1"/>
> <node id="3" uname="n3"/>
> </nodes>
>
> _#2 is seeing the state differently:_
>
> pcs config
> Cluster Name: n
> Corosync Nodes:
> n1 n3 n2
> Pacemaker Nodes:
> n1 n2 n3
>
> pcs status
> * 3 nodes configured
> Node List:
> * Online: [ n2 ]
> * OFFLINE: [ n1 n3 ]
> Full List of Resources:
> * No resources
> [...]
> (there are resources configured on #1 and #3)
>
> pcs cluster cib scope=nodes
> <nodes>
> <node id="1" uname="n1"/>
> <node id="3" uname="n3"/>
> <node id="2" uname="n2"/>
> </nodes>
>
> Help me diagnose it please. Where should I look for the problem? (I have
> already tried a few things more - I see nothing helpful in log files,
> pcs --debug shows nothing suspicious, tried even editing the CIB manually)
>
> Best regards,
>
> Piotr Szafarczyk
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list