Partitioners
A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a function for deriving a token representing a row from its partition key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token.
Cassandra offers the following partitioners that can be set in the cassandra.yaml file.
- Murmur3Partitioner (default): uniformly distributes data across the cluster based on MurmurHash hash values.
- RandomPartitioner: uniformly distributes data across the cluster based on MD5 hash values.
- ByteOrderedPartitioner: keeps an ordered distribution of data lexically by key bytes
The RandomPartitioner uses a cryptographic hash that takes longer to generate than the Murmur3Partitioner. Cassandra doesn't really need a cryptographic hash, so using the Murmur3Partitioner results in a 3-5 times improvement in performance.
You cannot change the partitioner in existing clusters that use a different partitioner.
Note: If using virtual nodes (vnodes), you do not need to calculate the tokens. If not using vnodes, you must calculate the tokens to assign to the initial_token parameter in the cassandra.yaml file. See Generating tokens and use the method for the type of partitioner you are using.
Murmur3Partitioner
The Murmur3Partitioner uses the MurmurHash function. This hashing function creates a 64-bit hash value of the partition key. The possible range of hash values is from -2^63 to +2^63-1.
When using the Murmur3Partitioner, you can page through all rows using the token function in a CQL query.
RandomPartitioner
The RandomPartitioner was the default partitioner prior to Cassandra 1.2. It is included for backwards compatibility.
The RandomPartitioner distributes data evenly across the nodes using an MD5 hash value of the row key. The possible range of hash values is from 0 to 2127 -1.
When using the RandomPartitioner, you can page through all rows using the token function in a CQL query.
ByteOrderedPartitioner
Cassandra provides the ByteOrderedPartitioner for ordered partitioning. It is included for backwards compatibility. This partitioner orders rows lexically by key bytes.
Using the ordered partitioner allows ordered scans by primary key. Although having the capability to do range scans on rows sounds like a desirable feature of ordered partitioners, there are ways to achieve the same functionality using table indexes.
Using an ordered partitioner is not recommended for the following reasons:
- Difficult load balancing
- Sequential writes can cause hot spots
- Uneven load balancing for multiple tables