Graceful Removal of Node

A node may need to be removed from a cluster for a variety of reasons. For example, the capacity might no longer be needed, or a node may need hardware maintenance, such having more memory added, or a node might be down due to hardware failure.

Designed to be fault-tolerant, Cassandra handles node removal gracefully.

The nodetool decommission command is for a planned removal, whereas the nodetool removenode command is for a dead node.

Setup a 4 Node cluster using SaltStack

nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   108.61 KiB  256          49.3%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    117.93 KiB  256          46.2%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  74.88 KiB  256          55.6%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1
UN  139.59.72.230   75 KiB     256          48.9%             88438a1f-1040-4f03-b152-a47dbf114504  rack1

Decommissioning a Node

Decommissioning a node is when you choose to take a node out of service.

The decommission command assigns the token ranges that the node was responsible for to other nodes, and then streams the data from the node being decommissioned to the other nodes.

Decommissioning a node does not remove data from the decommissioned node. It simply copies data to the nodes that are now responsible for it.

nodetool -h 139.59.72.230 -p 7199 decommission

See that the node is leaving, as indicated by UL in the nodetool status window

nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   108.61 KiB  256          49.3%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    114.5 KiB  256          46.2%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  74.88 KiB  256          55.6%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1
UL  139.59.72.230   79.95 KiB  256          48.9%             88438a1f-1040-4f03-b152-a47dbf114504  rack1

After a while, see that the node is gone and that its load has been assigned to the remaining nodes:

nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   123.24 KiB  256          66.9%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    129.17 KiB  256          60.8%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  99.62 KiB  256          72.3%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1

Putting a Node Back into Service

Since data is not removed from a node when it is decommissioned (the data is copied to the other nodes, but not removed from the decommissioned node), it is best to clear the data from the decommissioned node, if the node has been down for any length of time, before putting the node back into service.

In general, it is faster to have the node join as a clean one (with no data), rather than have it join with old data that then needs to be repaired.

Clearing Data From a Node

service cassandra stop
rm -r /var/lib/cassandra

Put a Node Back into Service

mkdir -p /var/lib/cassandra/data
mkdir -p /var/lib/cassandra/commitlog
mkdir -p /var/lib/cassandra/saved_caches
mkdir -p /var/lib/cassandra/hints

chmod a+w /var/lib/cassandra/data
chmod a+w /var/lib/cassandra/commitlog
chmod a+w /var/lib/cassandra/saved_caches
chmod a+w /var/lib/cassandra/hints

service cassandra start

See the node joining the cluster:

nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   123.24 KiB  256          66.9%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    129.17 KiB  256          60.8%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  99.62 KiB  256          72.3%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1
UJ  139.59.72.230   193.88 KiB  256          ?                 596bb656-3709-4ecb-85af-9a75196d1c1e  rack1

Removing a Dead Node

Removing a dead node from the cluster is done to reassign the token ranges that the dead node was responsible for to other nodes in the cluster and to populate other nodes with the data that the dead node had been responsible for.

nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   118.28 KiB  256          49.3%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    129.17 KiB  256          49.2%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  99.62 KiB  256          54.0%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1
DN  139.59.72.230   69.92 KiB  256          47.5%             596bb656-3709-4ecb-85af-9a75196d1c1e  rack1
nodetool removenode 596bb656-3709-4ecb-85af-9a75196d1c1e
nodetool removenode status
RemovalStatus: Removing token (-9219401015247577737). Waiting for replication confirmation from [/159.89.168.195,/139.59.95.39,/159.89.164.69].


nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   118.28 KiB  256          49.3%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    129.17 KiB  256          49.2%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  99.62 KiB  256          54.0%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1
DL  139.59.72.230   69.92 KiB  256          47.5%             596bb656-3709-4ecb-85af-9a75196d1c1e  rack1
nodetool removenode status
RemovalStatus: No token removals in process.


nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.164.69   123.41 KiB  256          66.9%             7e42de58-915f-457f-afa6-10613737037f  rack1
UN  139.59.95.39    134.3 KiB  256          60.8%             8f3617e8-dccf-4a0e-96e2-a05a15dc7da3  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  159.89.168.195  104.75 KiB  256          72.3%             a7d3b15b-8c80-4ac0-913a-29c0a4653091  rack1

results matching ""

    No results matching ""