Tutorial

Query Real Data with ClickHouse

Run your first analytical query on a large dataset, and see exactly why it's fast.

Create the table

CREATE TABLE sales (
    date Date,
    city LowCardinality(String),
    product LowCardinality(String),
    price UInt32
) ENGINE = MergeTree()
ORDER BY date;

Load 10 million rows

INSERT INTO sales
SELECT
    toDate('2020-01-01') + toIntervalDay(rand() % 1826) AS date,
    ['London','Manchester','Birmingham','Leeds','Glasgow'][rand() % 5 + 1] AS city,
    ['Laptop','Phone','Tablet','Monitor','Keyboard'][rand() % 5 + 1] AS product,
    rand() % 1900 + 100 AS price
FROM numbers(10000000);

Run the query

SELECT city, sum(price) AS total, count() AS orders
FROM sales
GROUP BY city
ORDER BY total DESC;

What's next

Now go and try this out in a live environment — boot a fresh cluster and play with the manifests above.

Start ClickHouse

CPU 2 cores ·RAM 4 GiB ·Disk 20 GiB ·Lifetime 7 days

Up next in Apache Airflow Mastery Chapter 3 of 5

DAG Patterns & Best Practices

Continue