165. Hash Join Simulator
Beginner Mode
Problem Statement
A hash join is how databases efficiently join two tables on a shared key. It works in two phases:
- Build phase: Scan one table and load its rows into a hash map keyed by the join column.
- Probe phase: Scan the other table and look up each row's join key in the hash map to find matches.
Implement the HashJoin class that simulates this algorithm:
HashJoin()initializes the join processor.void build(rows, int keyIndex)loads the build-side table.rowsis a list of rows (each row is a list of values).keyIndexis the column index to use as the join key.List[List] probe(rows, int keyIndex)probes against the build-side table. For each probe row, find all build rows with a matching key. For each match, output a combined row:build_row + probe_row. Return all combined rows sorted lexicographically.
This is an inner join: only rows with matching keys on both sides appear in the output.
Additional information
0 <= len(rows) <= 10,000- Each row contains 1 to 10 values (integers or strings).
0 <= keyIndex < len(row)- A key may appear in multiple rows on either side (many-to-many join).
buildis called exactly once beforeprobe.- Output rows must be sorted lexicographically.
Example 1:
Input:
["HashJoin", "build", "probe"]
[[], [[[1,"Alice","Eng"],[2,"Bob","Sales"],[3,"Carol","Eng"]], 2],
[[["Eng","BldgA"],["Sales","BldgB"],["Mkt","BldgC"]], 0]]
Output:
[null, null, [[1,"Alice","Eng","Eng","BldgA"],
[2,"Bob","Sales","Sales","BldgB"],
[3,"Carol","Eng","Eng","BldgA"]]]
Explanation: Alice and Carol are in "Eng" which matches the first probe row. Bob is in "Sales" which matches the second. "Mkt" has no match on the build side.
Example 2:
Input:
["HashJoin", "build", "probe"]
[[], [[[1,"a"],[1,"b"]], 0], [[[1,"x"],[1,"y"]], 0]]
Output:
[null, null, [[1,"a",1,"x"],[1,"a",1,"y"],[1,"b",1,"x"],[1,"b",1,"y"]]]
Explanation: Key 1 appears twice on each side, producing 2x2 = 4 combined rows (Cartesian product of matching rows).
Code Environment
Sign in or try as guest to run your code.
Essential
SQL 0/33
Git 0/15
Spark 0/20
Snowflake 0/22
Python 0/24
Need more practice in this area? Explore more questions →
Apple
TCS
X
Accenture
Adobe
Google
LinkedIn
Samsung
Datadog
Wix
Dropbox
Meta
OpenAI
Hulu
Uber
DoorDash
Anthropic
Amazon
ActivisionBlizzard
Vercel
Crypto.Com
Zscaler
DeutscheBank
GoDaddy
BMW
PayPal
Snowflake
AMD
Twilio
Atlassian
JPMorgan
NVIDIA
IBM
Databricks
Coinbase
Cisco
Robinhood
Twitter
Microsoft
Palantir
Netflix
VMware
Cloudflare
Stripe
Capital One
Splunk
Intel
SAP
Tesla
GitHub
JaneStreet
Bloomberg
Salesforce
Elastic
CGI
UBS
GitLab
Ubisoft
Slack
Nintendo
EY
Kayak
Lyft
Airbnb
Walmart
Revolut
Visa
Okta
HashiCorp
Instacart
Mastercard