SSTable Utilities
There are several utilities found in the bin and tools/bin directories that operate directly on SSTable data files on the filesystem of a Cassandra node
These files have a .db extension
Where are they found?
/var/lib/cassandra/data/<keyspace-name>/<table-name>-20e70b2030d511e8bdf32b06aa808981
Some SSTable Utilities are listed below
1.The sstableutil utility will list the SSTable files for a provided table name
2.To use the below commands install the cassandra tools
sudo apt-get update
sudo apt-get install cassandra-tools
general
sstableutil <keyspace-name> <table-name>
eg
sstableutil admatic emp
Output
hadoop@ip-172-31-86-204:~$ sstableutil admatic emp
WARN 04:56:51,348 Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5004. You can override this in cassandra.yaml
WARN 04:56:51,352 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2502. You can override this in cassandra.yaml
WARN 04:56:51,471 Only 16.368GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Listing files...
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-CompressionInfo.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Digest.crc32
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Filter.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Index.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Statistics.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Summary.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-TOC.txt
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-CompressionInfo.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Digest.crc32
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Filter.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Index.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Statistics.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Summary.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-TOC.txt
2.The sstableverify utility will verify the SSTable files for a provided keyspace and table name, identifying any files that exhibit errors or data corruption. Offline version of the nodetool verify command.
general
sstableverify <keyspace-name> <table-name>
eg
sstableverify admatic emp
Output
hadoop@ip-172-31-86-204:~$ sstableverify admatic emp
WARN 05:10:52,854 Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5004. You can override this in cassandra.yaml
WARN 05:10:52,858 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2502. You can override this in cassandra.yaml
WARN 05:10:52,967 Only 16.361GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Verifying BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db') (0.047KiB)
Deserializing sstable metadata for BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db')
Checking computed hash of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db')
Verifying BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db') (0.104KiB)
Deserializing sstable metadata for BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db')
Checking computed hash of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db')
3.The sstablescrub utility is an offline version of the nodetool scrub command. Because it runs offline, it can be more effective at removing corrupted data from SSTable files. If the tool removes any corrupt rows, you will need to run a repair
general
sudo sstablescrub <keyspace-name> <table-name>
eg
sudo sstablescrub admatic emp
Output
hadoop@ip-172-31-86-204:~$ sudo sstablescrub admatic emp
WARN 05:18:45,169 Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5004. You can override this in cassandra.yaml
WARN 05:18:45,173 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2502. You can override this in cassandra.yaml
WARN 05:18:45,277 Only 16.377GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Pre-scrub sstables snapshotted into snapshot pre-scrub-1522646326750
Scrubbing BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db') (0.047KiB)
Scrub of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db') complete: 1 rows in new sstable and 0 empty (tombstoned) rows dropped
Scrubbing BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db') (0.104KiB)
Scrub of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db') complete: 2 rows in new sstable and 0 empty (tombstoned) rows dropped