[ClusterLabs] How to force remove a cluster node?

Ken Gaillot kgaillot at redhat.com
Mon Apr 17 11:28:37 EDT 2017


On 04/13/2017 01:11 PM, Scott Greenlese wrote:
> Hi,
> 
> I need to remove some nodes from my existing pacemaker cluster which are
> currently unbootable / unreachable.
> 
> Referenced
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR
> 
> *4.4.4. Removing Cluster Nodes*
> The following command shuts down the specified node and removes it from
> the cluster configuration file, corosync.conf, on all of the other nodes
> in the cluster. For information on removing all information about the
> cluster from the cluster nodes entirely, thereby destroying the cluster
> permanently, refer to _Section 4.6, “Removing the Cluster
> Configuration”_
> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusterremove-HAAR.html#s2-noderemove-HAAR>.
> 
> pcs cluster node remove /node/
> 
> I ran the command with the cluster active on 3 of the 5 available
> cluster nodes (with quorum). The command fails with:
> 
> [root at zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
> Thu Apr 13 13:40:59 EDT 2017
> *Error: pcsd is not running on zs93kjpcs1*
> 
> 
> The node was not removed:
> 
> [root at zs90KP VD]# pcs status |less
> Cluster name: test_cluster_2
> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
> 2017 by root via cibadmin on zs93KLpcs1
> Stack: corosync
> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> partition with quorum
> 45 nodes and 180 resources configured
> 
> Node zs95KLpcs1: UNCLEAN (offline)
> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
> *OFFLINE: [ zs93kjpcs1 ]*
> 
> 
> Is there a way to force remove a node that's no longer bootable? If not,
> what's the procedure for removing a rogue cluster node?
> 
> Thank you...
> 
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.
> INTERNET: swgreenl at us.ibm.com

Yes, the pcs command is just a convenient shorthand for a series of
commands. You want to ensure pacemaker and corosync are stopped on the
node to be removed (in the general case, obviously already done in this
case), remove the node from corosync.conf and restart corosync on all
other nodes, then run "crm_node -R <nodename>" on any one active node.





More information about the Users mailing list