Skew-Aware Key Partitioner
Beginner Mode
Problem Statement
In distributed data processing (Spark, Kafka, MapReduce), events are partitioned across buckets by key. When certain keys are much more frequent than others ("hot keys" or "skewed keys"), one bucket becomes a bottleneck while others sit idle.
Design a partitioner that detects hot keys and spreads their events across multiple buckets to balance the load.
Implement the SkewHandler class:
SkewHandler(int numBuckets, int hotThreshold, int splitFactor)initializes the partitioner with:numBuckets-- the total number of buckets (indexed 0 to numBuckets-1)hotThreshold-- a key becomes "hot" once its total event count exceeds this valuesplitFactor-- hot keys are spread across this many consecutive buckets
int assign(string key)assigns an event to a bucket and returns the bucket index:- Compute the base bucket as
hash(key) % numBucketsusing the hash function:h = 0; for each char c in key: h = h * 31 + ascii(c); return h % numBuckets - If the key is not hot (count <= threshold): return the base bucket.
- If the key is hot (count > threshold): use round-robin across
splitFactorbuckets starting atbase. The i-th hot assignment goes to bucket(base + i % splitFactor) % numBuckets. - Each call increments the key's count.
- Compute the base bucket as
int[] getLoad()returns an array of lengthnumBucketswhere indexiis the number of events assigned to bucketi.
Additional information
1 <= numBuckets <= 1001 <= hotThreshold <= 10,0001 <= splitFactor <= numBucketskeyconsists of lowercase English letters, length 1 to 20.- At most
50,000calls will be made toassignandgetLoadcombined.
Example 1:
Input:
["SkewHandler", "assign", "assign", "assign", "assign", "assign", "assign", "getLoad"]
[[4, 3, 2], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], []]
Output: [null, 1, 1, 1, 1, 2, 1, [0, 5, 1, 0]]
Explanation:
hash("a") % 4 = 1(base bucket)- First 3 calls: count <= 3 (threshold), all go to bucket 1
- 4th call: count=4 > 3, key is now hot. Round-robin index 0 -> bucket
(1+0)%4 = 1 - 5th call: round-robin index 1 -> bucket
(1+1)%4 = 2 - 6th call: round-robin index 0 (cycles) -> bucket
(1+0)%4 = 1 - Load: bucket 1 has 5 events, bucket 2 has 1
Example 2:
Input:
["SkewHandler", "assign", "assign", "assign", "getLoad"]
[[2, 1, 2], ["k"], ["k"], ["k"], []]
Output: [null, 1, 1, 0, [1, 2]]
Explanation: hash("k") % 2 = 1. First call goes to bucket 1 (count=1, not hot). Second call: count=2 > 1, hot. Round-robin 0 -> bucket 1. Third call: round-robin 1 -> bucket (1+1)%2 = 0.
Code Environment
Sign in or try as guest to run your code.
Track
| Question | Difficulty | Company | Access |
|---|
Need more practice in this area? Explore more questions →
Okta