[ClusterLabs] How to force remove a cluster node?

Tue Apr 18 17:52:48 UTC 2017

My thanks to both Ken Gaillot and Tomas Jelinek for the workaround.   The
procedure(s) worked like a champ.

I just have a few side comments / observations ...

First - Tomas,  in the bugzilla you show this error message on your cluster
remove command, directing you to use the --force option:

[root at rh72-node1:~]# pcs cluster node remove rh72-node3
Error: pcsd is not running on rh72-node3, use --force to override

When I issue the cluster remove, I do not get and reference to the --force
option in the error message:

[root at zs93kl ]# pcs cluster node remove  zs95KLpcs1
Error: pcsd is not running on zs95KLpcs1
[root at zs93kl ]#

The man page doesn't mention --force at my level.  Is this a feature added
after pcs-0.9.143-15.el7_2.ibm.2.s390x ?

Also, in your workaround procedure,  you have me do: 'pcs cluster localnode
remove <name_of_node_to_be_removed> '.
However, wondering why the 'localnode' option is not in the pcs man page
for the pcs cluster command?
The command / option worked great, just curious why it's not documented ...

[root at zs93kl #  pcs cluster localnode remove zs93kjpcs1
zs93kjpcs1: successfully removed!

My man page level:

[root at zs93kl VD]# rpm -q --whatprovides /usr/share/man/man8/pcs.8.gz
pcs-0.9.143-15.el7_2.ibm.2.s390x
[root at zs93kl VD]#

Thanks again,

Scott G.

Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
  INTERNET:  swgreenl at us.ibm.com

From:	Tomas Jelinek <tojeline at redhat.com>
To:	users at clusterlabs.org
Date:	04/18/2017 09:04 AM
Subject:	Re: [ClusterLabs] How to force remove a cluster node?

Dne 17.4.2017 v 17:28 Ken Gaillot napsal(a):
> On 04/13/2017 01:11 PM, Scott Greenlese wrote:
>> Hi,
>>
>> I need to remove some nodes from my existing pacemaker cluster which are
>> currently unbootable / unreachable.
>>
>> Referenced
>>
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR

>>
>> *4.4.4. Removing Cluster Nodes*
>> The following command shuts down the specified node and removes it from
>> the cluster configuration file, corosync.conf, on all of the other nodes
>> in the cluster. For information on removing all information about the
>> cluster from the cluster nodes entirely, thereby destroying the cluster
>> permanently, refer to _Section 4.6, “Removing the Cluster
>> Configuration”_
>> <
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusterremove-HAAR.html#s2-noderemove-HAAR
>.
>>
>> pcs cluster node remove /node/
>>
>> I ran the command with the cluster active on 3 of the 5 available
>> cluster nodes (with quorum). The command fails with:
>>
>> [root at zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
>> Thu Apr 13 13:40:59 EDT 2017
>> *Error: pcsd is not running on zs93kjpcs1*
>>
>>
>> The node was not removed:
>>
>> [root at zs90KP VD]# pcs status |less
>> Cluster name: test_cluster_2
>> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
>> 2017 by root via cibadmin on zs93KLpcs1
>> Stack: corosync
>> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
>> partition with quorum
>> 45 nodes and 180 resources configured
>>
>> Node zs95KLpcs1: UNCLEAN (offline)
>> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
>> *OFFLINE: [ zs93kjpcs1 ]*
>>
>>
>> Is there a way to force remove a node that's no longer bootable? If not,
>> what's the procedure for removing a rogue cluster node?
>>
>> Thank you...
>>
>> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie,
N.Y.
>> INTERNET: swgreenl at us.ibm.com
>
> Yes, the pcs command is just a convenient shorthand for a series of
> commands. You want to ensure pacemaker and corosync are stopped on the
> node to be removed (in the general case, obviously already done in this
> case), remove the node from corosync.conf and restart corosync on all
> other nodes, then run "crm_node -R <nodename>" on any one active node.

Hi Scott,

It is possible to remove an offline node from a cluster with upstream
pcs 0.9.154 or RHEL pcs-0.9.152-5 (available in RHEL7.3) or newer.

If you have an older version, here's a workaround:
1. run 'pcs cluster localnode remove <nodename>' on all remaining nodes
2. run 'pcs cluster reload corosync' on one node
3. run 'crm_node -R <nodename> --force' on one node
It's basically the same procedure Ken described.

See https://bugzilla.redhat.com/show_bug.cgi?id=1225423 for more details.

Regards,
Tomas

_______________________________________________
Users mailing list: Users at clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170418/384e7612/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170418/384e7612/attachment-0002.gif>