SSTable Utilities

There are several utilities found in the bin and tools/bin directories that operate directly on SSTable data files on the filesystem of a Cassandra node

These files have a .db extension

Where are they found?

/var/lib/cassandra/data/<keyspace-name>/<table-name>-20e70b2030d511e8bdf32b06aa808981

Some SSTable Utilities are listed below

1.The sstableutil utility will list the SSTable files for a provided table name

2.To use the below commands install the cassandra tools

sudo apt-get update

sudo apt-get install cassandra-tools

general

sstableutil <keyspace-name> <table-name>

eg

sstableutil admatic emp

Output

hadoop@ip-172-31-86-204:~$ sstableutil admatic emp
WARN  04:56:51,348 Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5004.  You can override this in cassandra.yaml
WARN  04:56:51,352 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2502.  You can override this in cassandra.yaml
WARN  04:56:51,471 Only 16.368GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Listing files...
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-CompressionInfo.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Digest.crc32
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Filter.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Index.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Statistics.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Summary.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-TOC.txt
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-CompressionInfo.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Digest.crc32
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Filter.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Index.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Statistics.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Summary.db
/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-TOC.txt

2.The sstableverify utility will verify the SSTable files for a provided keyspace and table name, identifying any files that exhibit errors or data corruption. Offline version of the nodetool verify command.

general

sstableverify <keyspace-name> <table-name>

eg

sstableverify admatic emp

Output

hadoop@ip-172-31-86-204:~$ sstableverify admatic emp
WARN  05:10:52,854 Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5004.  You can override this in cassandra.yaml
WARN  05:10:52,858 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2502.  You can override this in cassandra.yaml
WARN  05:10:52,967 Only 16.361GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Verifying BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db') (0.047KiB)
Deserializing sstable metadata for BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db')
Checking computed hash of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db')
Verifying BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db') (0.104KiB)
Deserializing sstable metadata for BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db')
Checking computed hash of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db')

3.The sstablescrub utility is an offline version of the nodetool scrub command. Because it runs offline, it can be more effective at removing corrupted data from SSTable files. If the tool removes any corrupt rows, you will need to run a repair

general

sudo sstablescrub <keyspace-name> <table-name>

eg

sudo sstablescrub admatic emp

Output

hadoop@ip-172-31-86-204:~$ sudo sstablescrub admatic emp
WARN  05:18:45,169 Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5004.  You can override this in cassandra.yaml
WARN  05:18:45,173 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 2502.  You can override this in cassandra.yaml
WARN  05:18:45,277 Only 16.377GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Pre-scrub sstables snapshotted into snapshot pre-scrub-1522646326750
Scrubbing BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db') (0.047KiB)
Scrub of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-2-big-Data.db') complete: 1 rows in new sstable and 0 empty (tombstoned) rows dropped
Scrubbing BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db') (0.104KiB)
Scrub of BigTableReader(path='/var/lib/cassandra/data/admatic/emp-898db3e033e611e893220f8186253e29/mc-1-big-Data.db') complete: 2 rows in new sstable and 0 empty (tombstoned) rows dropped

4.The sstablerepairedset is used to mark specific SSTables as repaired or unrepaired to enable transitioning a node to incremental repair. Because incremental repair is the default as of the 2.2 release, clusters that have been created on 2.2 or later will have no need to use this tool

5.The sstableexpiredblockers utility will reveal blocking SSTables that prevent an SSTable from being deleted. This class outputs all SSTables that are blocking other SSTables from getting dropped so you can determine why a given SSTable is still on disk

6.The sstablelevelreset utility will reset the level to 0 on a given set of SSTables, which will force the SSTable to be compacted as part of the next compaction operation

7.The sstableofflinerelevel utility will reassign SSTable levels for tables using the LeveledCompactionStrategy. This is useful when a large amount of data is ingested quickly, such as with a bulk import

8.The sstablesplit utility is used to split SSTables files into multiple SSTables of a maximum designated size. This is useful if a major compaction has generated large tables that otherwise would not be compacted for a long time

results matching ""

    No results matching ""