Cassandra Query Language (CQL) - CREATE TABLE
This is some syntax to work with Cassandra.
CREATE TABLE
Required parameters:
table_name: name of the table to index
column_name: name of the column
column_definition
Restriction:
- A table must have at least one PRIMARY KEY
- When PRIMARY KEY is at the end of a column definition, that column is the only primary key for the table, and is defined as the partition key (*)
- A static column cannot be a primary key (**)
- Primary keys can include frozen collections
When you create a table in SQL (especially in Cassandra or similar NoSQL databases), you can define a primary key in two ways:
After the column, like this:
CREATE TABLE users (
id UUID PRIMARY KEY,
name text,
email text
);
Or at the end, like this:
CREATE TABLE users (
id UUID,
name text,
email text,
PRIMARY KEY (id)
);
Both definitions mean the same thing: id is the only primary key
column.
Why does this matter?
In databases like Cassandra, the primary key has two parts:
-
Partition Key – decides how data is distributed across nodes
-
Clustering Columns – decide how data is sorted inside a partition
When you write:
id UUID
PRIMARY KEY
or:
PRIMARY KEY
(id)
You are defining only one column (id) as the primary key. If there's only one primary key column, it automatically becomes the partition key.
Partition key = the column used to split and distribute data across nodes
Example:
Define table:
CREATE TABLE orders (
order_id UUID PRIMARY KEY,
customer_id UUID,
amount decimal
);
This means:
-
order_idis the only primary key -
order_idis also the partition key -
The table has no clustering columns
If the primary key has more than one column
CREATE TABLE orders (
customer_id UUID,
order_id UUID,
amount decimal,
PRIMARY KEY (customer_id, order_id)
);
Here:
-
customer_id= partition key -
order_id= clustering key
Let's explain (**):
What is the static column?
Cassandra tables are organized like this:
-
Partition key → identifies a group of rows (a partition)
-
Clustering columns → define multiple rows within that partition
Example (without static column)
CREATE TABLE orders (
customer_id UUID, --
partition key
order_id UUID, --
clustering column
amount decimal,
PRIMARY KEY (customer_id, order_id)
);
order_id row has its own
values.
CREATE TABLE orders (
customer_id UUID,
order_id UUID,
customer_name text static,
amount decimal,
PRIMARY KEY (customer_id, order_id)
);
What happens here?
-
customer_nameis static -
So for all rows in the same customer partition,
customer_nameis shared.
| customer_id | order_id | amount | customer_name |
|---|---|---|---|
| C100 | O1 | 20.0 | Alice |
| C100 | O2 | 35.0 | Alice |
| C100 | O3 | 50.0 | Alice |
We only store
customer_name once per
partition, not once per row.
When to use static columns?
Use them when you have data that:
-
Belongs to the entire partition (not specific to each row)
-
Changes rarely or remain constant
Examples:
| Table | Good static column |
|---|---|
| Posts with comments | post_title, post_author |
| Orders by customer | customer_name, customer_status |
| IoT time-series data | device_model, firmware_version |
Reference: CREATE TABLE | Apache Cassandra Documentation
Nhận xét
Đăng nhận xét