<html><body><p><font size="2">Hi all...</font><br><br><font size="2">When I try to add a previously removed cluster node back into my pacemaker cluster, I get the following error:</font><br><br><font size="2">[root@zs93kl]# pcs cluster node add zs95KLpcs1,zs95KLpcs2</font><br><font size="2">Error: Unable to add 'zs95KLpcs1' to cluster: node is already in a cluster</font><br><br><font size="2">The node I am adding was recently removed from the cluster, but apparently the removal</font><br><font size="2">was incomplete. </font><br><br><font size="2">I am looking for some help to thoroughly remove zs95KLpcs1 from this (or any other)</font><br><font size="2">cluster that this host may be a part of. </font><br><br><br><font size="2">Background:</font><br><br><font size="2">I had removed node ( zs95KLpcs1) from my 3 node, single ring protocol pacemaker cluster while that node</font><br><font size="2">(which happens to be a KVM on System Z Linux host), was deactivated / shut down due to </font><br><font size="2">relentless, unsolicited STONITH events. My thought was that there was some issue with the ring0 </font><br><font size="2">interface (on vlan1293) causing the cluster to initiate fence (power off) actions, just minutes after</font><br><font size="2">joining the cluster. That's why I went ahead and deactivated that node. </font><br><br><font size="2">The first procedure I used to remove zs95KLpcs1 was flawed, because I forgot that there's an issue with </font><br><font size="2">attempting to remove an unreachable cluster node on the older pacemaker code: </font><br><br><font size="2">[root@zs95kj ]# date;pcs cluster node remove zs95KLpcs1</font><br><font size="2">Tue Jun 27 18:28:23 EDT 2017</font><br><font size="2">Error: pcsd is not running on zs95KLpcs1</font><br><br><font size="2">I then followed this procedure (courtesy of </font><font size="2">Tomas</font><font size="2"> and Ken in</font><font size="2"> this user group): </font><br><br><tt><font size="2">1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes<br>2. run 'pcs cluster reload corosync' on one node<br>3. run 'crm_node -R <nodename> --force' on one node</font></tt><br><br><font size="2">My execution: </font><br><br><font size="2">I made the mistake of manually removing the target node (zs95KLpcs1) stanza from corosync.conf file before</font><br><font size="2">executing the above procedure: </font><br><br><font size="2">[root@zs95kj ]# vi /etc/corosync/corosync.conf</font><br><br><font size="2">Removed this stanza:</font><br><br><font size="2"> node {</font><br><font size="2"> ring0_addr: zs95KLpcs1</font><br><font size="2"> nodeid: 3</font><br><font size="2"> }</font><br><br><font size="2">I then followed the recommended steps ...</font><br><br><font size="2">[root@zs95kj ]# pcs cluster localnode remove zs95KLpcs1</font><br><font size="2">Error: unable to remove zs95KLpcs1 ### I assume this was because I manually removed the stanza (above)</font><br><br><font size="2">[root@zs93kl ]# pcs cluster localnode remove zs95KLpcs1</font><br><font size="2">zs95KLpcs1: successfully removed!</font><br><font size="2">[root@zs93kl ]#</font><br><br><font size="2">[root@zs95kj ]# pcs cluster reload corosync</font><br><font size="2">Corosync reloaded</font><br><font size="2">[root@zs95kj ]#</font><br><br><font size="2">[root@zs95kj ]# crm_node -R zs95KLpcs1 --force</font><br><font size="2">[root@zs95kj ]#</font><br><br><br><font size="2">[root@zs95kj ]# pcs status |less</font><br><font size="2">Cluster name: test_cluster_2</font><br><font size="2">Last updated: Tue Jun 27 18:39:14 2017 Last change: Tue Jun 27 18:38:56 2017 by root via crm_node on zs95kjpcs1</font><br><font size="2">Stack: corosync</font><br><font size="2">Current DC: zs93KLpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - partition with quorum</font><br><font size="2">45 nodes and 227 resources configured</font><br><br><b><font size="2">Online: [ zs93KLpcs1 zs95kjpcs1 ]</font></b><br><br><br><font size="2">This seemed to work well, at least I'm showing only the two cluster nodes. </font><br><font size="2"> </font><br><font size="2">Later on, once I was able to activate zs95KLpcs1 (former cluster member) ... I did what I thought</font><br><font size="2">I should do to tell that node that it's no longer a member of the cluster: </font><br><br><font size="2">[root@zs95kj ]# cat neuter.sh</font><br><font size="2">ssh root@zs95KL "/usr/sbin/pcs cluster localnode </font><b><font size="2">remove </font></b><font size="2">zs95KLpcs1"</font><br><font size="2">ssh root@zs95KL "/usr/sbin/pcs cluster reload corosync"</font><br><font size="2">ssh root@zs95KL "/usr/sbin/crm_node -R zs95KLpcs1 --force"</font><br><br><font size="2">[root@zs95kj ]# ./neuter.sh</font><br><font size="2">zs95KLpcs1:</font><b><font size="2" color="#0000FF"> </font></b><b><font size="2">successfully removed!</font></b><br><font size="2">Corosync reloaded</font><br><font size="2">[root@zs95kj ]#</font><br><br><br><font size="2">Next, I followed a procedure to convert my current 2-node, single ring cluster to RRP ... which seems to be running</font><br><font size="2">well, and the corosync config looks like this: </font><br><br><font size="2">[root@zs93kl ]# for host in zs95kjpcs1 zs93KLpcs1 ; do ssh $host "hostname;corosync-cfgtool -s"; done</font><br><font size="2">zs95kj</font><br><font size="2">Printing ring status.</font><br><font size="2">Local node ID 2</font><br><font size="2">RING ID 0</font><br><font size="2"> id = 10.20.93.12</font><br><font size="2"> status = ring 0 active with no faults</font><br><font size="2">RING ID 1</font><br><font size="2"> id = 10.20.94.212</font><br><font size="2"> status = ring 1 active with no faults</font><br><br><font size="2">zs93kl</font><br><font size="2">Printing ring status.</font><br><font size="2">Local node ID 5</font><br><font size="2">RING ID 0</font><br><font size="2"> id = 10.20.93.13</font><br><font size="2"> status = ring 0 active with no faults</font><br><font size="2">RING ID 1</font><br><font size="2"> id = 10.20.94.213</font><br><font size="2"> status = ring 1 active with no faults</font><br><font size="2">[root@zs93kl ]#</font><br><br><br><font size="2">So now, when I try to add zs95KLpcs1 (and the second ring interface, zs95KLpcs2) to the RRP config, </font><br><font size="2">I get the error: </font><br><br><font size="2">[root@zs93kl]# pcs cluster node add zs95KLpcs1,zs95KLpcs2</font><br><font size="2">Error: Unable to add 'zs95KLpcs1' to cluster: node is already in a cluster</font><br><br><br><font size="2">I re-ran the node removal procedures, and also deleted /etc/corosync/corosync.conf </font><br><font size="2">on the target node zs95KLpcs1, and nothing I've tried resolves my problem. </font><br><br><font size="2">I checked to see if zs95KLpcs1 exists in any "corosync.conf" file on the 3 nodes, and it does not. </font><br><br><font size="2">[root@zs95kj corosync]# grep zs95KLpcs1 *</font><br><font size="2">[root@zs95kj corosync]#</font><br><br><font size="2">[root@zs93kl corosync]# grep zs95KLpcs1 *</font><br><font size="2">[root@zs95kj corosync]#</font><br><br><font size="2">[root@zs95KL corosync]# grep zs95KLpcs1 *</font><br><font size="2">[root@zs95kj corosync]#</font><br><br><font size="2">Thanks in advance ..</font><br><font size="2"><br>Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.<br> INTERNET: swgreenl@us.ibm.com <br></font><BR>
</body></html>