164. Skew-Aware Key Partitioner
Beginner Mode
Problem Statement
In distributed data processing (Spark, Kafka, MapReduce), events are partitioned across buckets by key. When certain keys are much more frequent than others ("hot keys" or "skewed keys"), one bucket becomes a bottleneck while others sit idle.
Design a partitioner that detects hot keys and spreads their events across multiple buckets to balance the load.
Implement the SkewHandler class:
SkewHandler(int numBuckets, int hotThreshold, int splitFactor)initializes the partitioner with:numBuckets-- the total number of buckets (indexed 0 to numBuckets-1)hotThreshold-- a key becomes "hot" once its total event count exceeds this valuesplitFactor-- hot keys are spread across this many consecutive buckets
int assign(string key)assigns an event to a bucket and returns the bucket index:- Compute the base bucket as
hash(key) % numBucketsusing the hash function:h = 0; for each char c in key: h = h * 31 + ascii(c); return h % numBuckets - If the key is not hot (count <= threshold): return the base bucket.
- If the key is hot (count > threshold): use round-robin across
splitFactorbuckets starting atbase. The i-th hot assignment goes to bucket(base + i % splitFactor) % numBuckets. - Each call increments the key's count.
- Compute the base bucket as
int[] getLoad()returns an array of lengthnumBucketswhere indexiis the number of events assigned to bucketi.
Additional information
1 <= numBuckets <= 1001 <= hotThreshold <= 10,0001 <= splitFactor <= numBucketskeyconsists of lowercase English letters, length 1 to 20.- At most
50,000calls will be made toassignandgetLoadcombined.
Example 1:
Input:
["SkewHandler", "assign", "assign", "assign", "assign", "assign", "assign", "getLoad"]
[[4, 3, 2], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], []]
Output: [null, 1, 1, 1, 1, 2, 1, [0, 5, 1, 0]]
Explanation:
hash("a") % 4 = 1(base bucket)- First 3 calls: count <= 3 (threshold), all go to bucket 1
- 4th call: count=4 > 3, key is now hot. Round-robin index 0 -> bucket
(1+0)%4 = 1 - 5th call: round-robin index 1 -> bucket
(1+1)%4 = 2 - 6th call: round-robin index 0 (cycles) -> bucket
(1+0)%4 = 1 - Load: bucket 1 has 5 events, bucket 2 has 1
Example 2:
Input:
["SkewHandler", "assign", "assign", "assign", "getLoad"]
[[2, 1, 2], ["k"], ["k"], ["k"], []]
Output: [null, 1, 1, 0, [1, 2]]
Explanation: hash("k") % 2 = 1. First call goes to bucket 1 (count=1, not hot). Second call: count=2 > 1, hot. Round-robin 0 -> bucket 1. Third call: round-robin 1 -> bucket (1+1)%2 = 0.
Code Environment
Sign in or try as guest to run your code.
Essential
SQL 0/33
Git 0/15
Spark 0/20
Snowflake 0/22
Python 0/24
Need more practice in this area? Explore more questions →
Okta
TCS
X
Accenture
Adobe
Google
LinkedIn
Samsung
Datadog
Wix
Dropbox
Meta
OpenAI
Hulu
Uber
DoorDash
Anthropic
Amazon
ActivisionBlizzard
Vercel
Crypto.Com
Zscaler
DeutscheBank
Apple
GoDaddy
BMW
PayPal
Snowflake
AMD
Twilio
Atlassian
JPMorgan
NVIDIA
IBM
Databricks
Coinbase
Cisco
Robinhood
Twitter
Microsoft
Palantir
Netflix
VMware
Cloudflare
Stripe
Capital One
Splunk
Intel
SAP
Tesla
GitHub
JaneStreet
Bloomberg
Salesforce
Elastic
CGI
UBS
GitLab
Ubisoft
Slack
Nintendo
EY
Kayak
Lyft
Airbnb
Walmart
Revolut
Visa
HashiCorp
Instacart
Mastercard