Types of Keys
Defining a basic primary key
The primary key consists of only the partition key in this case. Data stored with a simple primary key will be fast to insert and retrieve if many values for the column can distribute the partitions across many nodes.
Keep in mind that only the primary key can be specified when retrieving data from the table (unless you use secondary indexes).
CREATE KEYSPACE IF NOT EXISTS cycling WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};
USE cycling;
A simple primary key table can be created in two different ways, as shown.
CREATE TABLE cyclist_name ( id UUID PRIMARY KEY, lastname text, firstname text );
CREATE TABLE cyclist_name ( id UUID, lastname text, firstname text, PRIMARY KEY (id) );
INSERT INTO cyclist_name (id, lastname, firstname) VALUES (41d01b63-244e-435d-bd6d-5dc8a0addb8c, 'A', 'J');
SELECT token(id), id, lastname, firstname FROM cyclist_name;
system.token(id) | id | lastname | firstname
----------------------+--------------------------------------+----------+-----------
-2767158835560740559 | 41d01b63-244e-435d-bd6d-5dc8a0addb8c | A | J
(1 rows)
This table can be queried to discover the name of a cyclist given their ID number.
Defining a multi-column partition key
These columns form logical sets inside a partition to facilitate retrieval. Composite partition keys are used when the data stored is too large to reside in a single partition. Using more than one column for the partition key breaks the data into chunks, or buckets. This method can be effective if a cluster experiences hotspotting, or congestion in writing data to one node repeatedly, because a partition is heavily writing.
Data is retrieved using the partition key. Keep in mind that to retrieve data from the table, values for all columns defined in the partition key have to be supplied, if secondary indexes are not used.
Using a composite partition key
CREATE TABLE rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
INSERT INTO rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2000, '1', 'A', 1);
SELECT token(race_year, race_name), race_year, race_name, cyclist_name, rank from rank_by_year_and_name;
system.token(race_year, race_name) | race_year | race_name | cyclist_name | rank
------------------------------------+-----------+-----------+--------------+------
-108427357362698080 | 2000 | 1 | A | 1
(1 rows)
Use race_year and race_name for the composite partition key. The table definition shown has an additional column rank used in the primary key.
Defining a partition key with clustering columns
Clustering is a storage engine process that sorts data within each partition based on the definition of the clustering columns. Normally, columns are sorted in ascending alphabetical order.
Grouping data in tables using clustering columns is the equivalent of JOINs in a relational database, but are much more performant because only one table is accessed.
The database stores an entire row of data on a node by partition key and can order the data for retrieval with clustering columns.Retrieving data from a partition is more versatile with clustering columns. If you have more complex needs for querying, use a compound primary key.
Using a compound primary key
Use a compound primary key to create multiple columns that you can use to query and return sorted results.
If our pro cycling example was designed in a relational database, you would create a cyclists table with a foreign key to the races. You denormalize the data because joins are not performant in a distributed system.
CREATE TABLE cyclist_category (
category text,
points int,
id UUID,
lastname text,
PRIMARY KEY (category, points)
) WITH CLUSTERING ORDER BY (points DESC);
INSERT INTO cyclist_category (category, points, id, lastname) VALUES ('best', 1, 41d01b63-244e-435d-bd6d-5dc8a0addb8c, 'J');
SELECT token(category), category, points, id, lastname from cyclist_category;
system.token(category) | category | points | id | lastname
------------------------+----------+--------+--------------------------------------+----------
8664726657035972693 | best | 1 | 41d01b63-244e-435d-bd6d-5dc8a0addb8c | J
(1 rows)
SELECT token(points), category, points, id, lastname from cyclist_category;
system.token(points) | category | points | id | lastname
----------------------+----------+--------+--------------------------------------+----------
-4069959284402364209 | best | 1 | 41d01b63-244e-435d-bd6d-5dc8a0addb8c | J
(1 rows)
The table uses category for the partition key and points for a single clustering column. This table can be queried to retrieve a list of cyclists and their points in a category, sorted by points.
Ascending order is more efficient to store, but descending queries are faster due to the nature of the storage engine.
Note: The combination of the category and points uniquely identifies a row in the cyclist_category table. More than one row with the same category can exist as long as the rows contain different pointsvalues.