Cassandra Query Language (CQL) - CREATE TABLE

This is some syntax to work with Cassandra.

CREATE TABLE

Required parameters:

table_name: name of the table to index

column_name: name of the column

column_definition

Restriction:

  • A table must have at least one PRIMARY KEY
  • When PRIMARY KEY is at the end of a column definition, that column is the only primary key for the table, and is defined as the partition key (*)
  • A static column cannot be a primary key (**)
  • Primary keys can include frozen collections
So, let's explain (*):

When you create a table in SQL (especially in Cassandra or similar NoSQL databases), you can define a primary key in two ways: 

After the column, like this:

CREATE TABLE users (

    id UUID PRIMARY KEY,

    name text,

    email text

);

Or at the end, like this:

CREATE TABLE users (

    id UUID,

    name text,

    email text,

    PRIMARY KEY (id)

);

Both definitions mean the same thing: id is the only primary key column.

Why does this matter? 

In databases like Cassandra, the primary key has two parts:

  1. Partition Key – decides how data is distributed across nodes

  2. Clustering Columns – decide how data is sorted inside a partition

        When you write:

        id UUID PRIMARY KEY

        or:

        PRIMARY KEY (id)

You are defining only one column (id) as the primary key. If there's only one primary key column, it automatically becomes the partition key.

Partition key = the column used to split and distribute data across nodes

Example:

Define table:

CREATE TABLE orders (

    order_id UUID PRIMARY KEY,

    customer_id UUID,

    amount decimal

);

This means:

  • order_id is the only primary key

  • order_id is also the partition key

  • The table has no clustering columns

 If the primary key has more than one column

Then you must use the end-of-table syntax:

CREATE TABLE orders (
    customer_id UUID,
    order_id UUID,
    amount decimal,
    PRIMARY KEY (customer_id, order_id)
);

Here:

  • customer_id = partition key

  • order_id = clustering key


Let's explain (**):

What is the static column?

Cassandra tables are organized like this:

  • Partition key → identifies a group of rows (a partition)

  • Clustering columns → define multiple rows within that partition

A static column stores values that belong to the whole partition, not to individual rows.

It acts like metadata or shared properties for all rows inside the same partition.

Example (without static column)

CREATE TABLE orders (

    customer_id UUID,         -- partition key

    order_id UUID,            -- clustering column

    amount decimal,

    PRIMARY KEY (customer_id, order_id)

);

Each order_id row has its own values.

Example (with static column)

CREATE TABLE orders (
    customer_id UUID,
    order_id UUID,
    customer_name text static,
    amount decimal,
    PRIMARY KEY (customer_id, order_id)
);
 

What happens here?

  • customer_name is static

  • So for all rows in the same customer partition, customer_name is shared.

Partition example:

customer_id order_id amount customer_name
C100 O1 20.0 Alice
C100 O2 35.0 Alice
C100 O3 50.0 Alice

We only store customer_name once per partition, not once per row.

When to use static columns?

Use them when you have data that:

  • Belongs to the entire partition (not specific to each row)

  • Changes rarely or remain constant

Examples:

Table Good static column
Posts with comments post_title, post_author
Orders by customer customer_name, customer_status
IoT time-series data device_model, firmware_version

Reference: CREATE TABLE | Apache Cassandra Documentation


Nhận xét

Bài đăng phổ biến từ blog này

GIT Command Lines

How to check if your IP can connect to another address