Scenario
A manufacturing company needs to track its production timeline by combining manufacturing records with product details and numbering each entry chronologically.
Task
Write a Snowflake SQL query that:
- Joins
{{ ref("production_records") }} with {{ ref("products") }} on product_id using an INNER JOIN
- Assigns a sequential
row_number to each row ordered by manufacturing_date in ascending order using ROW_NUMBER()
- Returns the columns:
product_id, manufacturing_date, manufacturing_location, product_name, product_type, and row_number
Schema
production_records
| Column |
Type |
Description |
| product_id |
String |
Unique product identifier |
| manufacturing_date |
Date |
Date the product was manufactured |
| manufacturing_location |
String |
Factory or site where manufacturing took place |
products
| Column |
Type |
Description |
| product_id |
String |
Unique product identifier (matches production_records) |
| product_name |
String |
Name of the product |
| product_type |
String |
Category or type of the product |
Example
production_records:
| product_id |
manufacturing_date |
manufacturing_location |
| A1 |
2024-03-10 |
Austin |
| A2 |
2024-03-08 |
Portland |
| A3 |
2024-03-12 |
Austin |
| A4 |
2024-03-06 |
Chandler |
| A5 |
2024-03-15 |
Portland |
products:
| product_id |
product_name |
product_type |
| A1 |
Core_i7 |
Processor |
| A2 |
Arc_A770 |
GPU |
| A3 |
Xeon_W |
Processor |
| A4 |
Optane_900 |
Storage |
| A5 |
NUC_13 |
Mini_PC |
Expected Output:
| product_id |
manufacturing_date |
manufacturing_location |
product_name |
product_type |
row_number |
| A4 |
2024-03-06 |
Chandler |
Optane_900 |
Storage |
1 |
| A2 |
2024-03-08 |
Portland |
Arc_A770 |
GPU |
2 |
| A1 |
2024-03-10 |
Austin |
Core_i7 |
Processor |
3 |
| A3 |
2024-03-12 |
Austin |
Xeon_W |
Processor |
4 |
| A5 |
2024-03-15 |
Portland |
NUC_13 |
Mini_PC |
5 |
Note: The row_number is a global sequential counter ordered by manufacturing_date. It does not reset per product or location. The earliest date gets row 1, the next date gets row 2, and so on.
WITH joined AS (
SELECT
pr.product_id,
pr.manufacturing_date,
pr.manufacturing_location,
p.product_name,
p.product_type
FROM {{ ref("production_records") }} pr
INNER JOIN {{ ref("products") }} p
ON pr.product_id = p.product_id
)
SELECT
product_id,
manufacturing_date,
manufacturing_location,
product_name,
product_type,
ROW_NUMBER() OVER (ORDER BY manufacturing_date ASC) AS row_number
FROM joined
Explanation
Step 1: Join the two tables with a CTE
WITH joined AS (
SELECT
pr.product_id,
pr.manufacturing_date,
pr.manufacturing_location,
p.product_name,
p.product_type
FROM {{ ref("production_records") }} pr
INNER JOIN {{ ref("products") }} p
ON pr.product_id = p.product_id
)
An INNER JOIN combines rows from both tables where product_id matches. Products that exist in only one table are excluded from the result. The CTE explicitly lists the five base columns so the next step can add the window function cleanly.
Step 2: Assign sequential row numbers
ROW_NUMBER() OVER (ORDER BY manufacturing_date ASC) AS row_number
ROW_NUMBER() is a window function that assigns a unique sequential integer to each row. The OVER (ORDER BY manufacturing_date ASC) clause determines the numbering order. Because there is no PARTITION BY, the counter runs across the entire result set (the earliest manufacturing date gets 1, the next gets 2, and so on).
Step 3: Why use a CTE instead of a single query?
You cannot reference a window function alias in the same SELECT clause where it is defined. By computing the join in a CTE first, the outer query can cleanly apply ROW_NUMBER() without ambiguity. Snowflake also does not allow window functions in WHERE clauses, so a CTE is the standard pattern when you need to filter or build on window results later.
Alternative: Single query without CTE
SELECT
pr.product_id,
pr.manufacturing_date,
pr.manufacturing_location,
p.product_name,
p.product_type,
ROW_NUMBER() OVER (ORDER BY pr.manufacturing_date ASC) AS row_number
FROM {{ ref("production_records") }} pr
INNER JOIN {{ ref("products") }} p
ON pr.product_id = p.product_id
This version skips the CTE and applies ROW_NUMBER directly in the same query. Explicit column listing avoids duplicate product_id columns that SELECT * would produce when joining two tables with a shared column name.