Skew-Aware Key Partitioner
Beginner Mode

Problem Statement

In distributed data processing (Spark, Kafka, MapReduce), events are partitioned across buckets by key. When certain keys are much more frequent than others ("hot keys" or "skewed keys"), one bucket becomes a bottleneck while others sit idle.

Design a partitioner that detects hot keys and spreads their events across multiple buckets to balance the load.

Implement the SkewHandler class:

  • SkewHandler(int numBuckets, int hotThreshold, int splitFactor) initializes the partitioner with:
    • numBuckets -- the total number of buckets (indexed 0 to numBuckets-1)
    • hotThreshold -- a key becomes "hot" once its total event count exceeds this value
    • splitFactor -- hot keys are spread across this many consecutive buckets
  • int assign(string key) assigns an event to a bucket and returns the bucket index:
    • Compute the base bucket as hash(key) % numBuckets using the hash function: h = 0; for each char c in key: h = h * 31 + ascii(c); return h % numBuckets
    • If the key is not hot (count <= threshold): return the base bucket.
    • If the key is hot (count > threshold): use round-robin across splitFactor buckets starting at base. The i-th hot assignment goes to bucket (base + i % splitFactor) % numBuckets.
    • Each call increments the key's count.
  • int[] getLoad() returns an array of length numBuckets where index i is the number of events assigned to bucket i.

Additional information

  • 1 <= numBuckets <= 100
  • 1 <= hotThreshold <= 10,000
  • 1 <= splitFactor <= numBuckets
  • key consists of lowercase English letters, length 1 to 20.
  • At most 50,000 calls will be made to assign and getLoad combined.

Example 1:

Input:
["SkewHandler", "assign", "assign", "assign", "assign", "assign", "assign", "getLoad"]
[[4, 3, 2], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], []]

Output: [null, 1, 1, 1, 1, 2, 1, [0, 5, 1, 0]]

Explanation:

  • hash("a") % 4 = 1 (base bucket)
  • First 3 calls: count <= 3 (threshold), all go to bucket 1
  • 4th call: count=4 > 3, key is now hot. Round-robin index 0 -> bucket (1+0)%4 = 1
  • 5th call: round-robin index 1 -> bucket (1+1)%4 = 2
  • 6th call: round-robin index 0 (cycles) -> bucket (1+0)%4 = 1
  • Load: bucket 1 has 5 events, bucket 2 has 1

Example 2:

Input:
["SkewHandler", "assign", "assign", "assign", "getLoad"]
[[2, 1, 2], ["k"], ["k"], ["k"], []]

Output: [null, 1, 1, 0, [1, 2]]

Explanation: hash("k") % 2 = 1. First call goes to bucket 1 (count=1, not hot). Second call: count=2 > 1, hot. Round-robin 0 -> bucket 1. Third call: round-robin 1 -> bucket (1+1)%2 = 0.

Quick Solution

Code Environment

Sign in or try as guest to run your code.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →