Problem Statement
Design a data structure that processes a stream of integers and can report the k most frequent elements at any point.
Implement the TopKFrequent class:
TopKFrequent(int k) initializes the tracker with the target k.
void add(int num) processes a new integer from the stream.
int[] topK() returns the k most frequent elements seen so far, sorted by frequency in descending order. If two elements have the same frequency, the smaller value comes first. If fewer than k distinct elements have been seen, return all of them.
Additional information
1 <= k <= 1000
0 <= num <= 100,000
- At most
50,000 calls will be made to add and topK combined.
Example 1:
Input:
["TopKFrequent", "add", "add", "add", "topK", "add", "add", "add", "topK"]
[[2], [1], [1], [2], [], [3], [3], [3], []]
Output: [null, null, null, null, [1, 2], null, null, null, [3, 1]]
Explanation:
- After adding 1, 1, 2: freq(1)=2, freq(2)=1. Top 2: [1, 2]
- After adding 3, 3, 3: freq(3)=3, freq(1)=2, freq(2)=1. Top 2: [3, 1]
Example 2:
Input:
["TopKFrequent", "add", "topK"]
[[3], [5], []]
Output: [null, null, [5]]
Explanation: Only one element seen, so topK() returns [5] even though k=3.
Brute Force (Sort All Frequencies)
Intuition
Maintain a hash map of frequencies. Each time topK() is called, convert the hash map to a list, sort it by frequency descending (ties by value ascending), and return the first k elements.
Algorithm
- On
add(num): increment freq[num].
- On
topK(): sort all entries by (-frequency, value), return the first k values.
from collections import defaultdict
class TopKFrequent:
def __init__(self, k):
self.k = k
self.freq = defaultdict(int)
def add(self, num):
self.freq[num] += 1
def topK(self):
items = sorted(self.freq.items(), key=lambda x: (-x[1], x[0]))
return [num for num, _ in items[:self.k]]
Time & Space Complexity
Time complexity: O(1) per add, O(n log n) per topK
Reason: Sorting all n distinct elements dominates the topK call.
Space complexity: O(n)
Reason: The hash map stores all distinct elements.
Heap Selection (Optimal)
Intuition
Instead of sorting all elements, we only need the top k. A min-heap of size k lets us efficiently select the k largest frequencies. We push (-frequency, value) pairs and use heapq.nsmallest which internally uses a heap bounded at size k.
Algorithm
- On
add(num): increment freq[num] in O(1).
- On
topK(): build a list of (-count, num) for all entries, use heapq.nsmallest(k, ...) to get the k elements with the highest frequency (smallest negative count), breaking ties by smallest value.
import heapq
from collections import defaultdict
class TopKFrequent:
def __init__(self, k):
self.k = k
self.freq = defaultdict(int)
def add(self, num):
self.freq[num] += 1
def topK(self):
items = [(-count, num) for num, count in self.freq.items()]
top = heapq.nsmallest(self.k, items)
return [num for _, num in top]
Time & Space Complexity
Time complexity: O(1) per add, O(n log k) per topK
Reason: heapq.nsmallest(k, n items) runs in O(n log k) using a bounded heap of size k, which is faster than full sort when k is much smaller than n.
Space complexity: O(n + k)
Reason: O(n) for the frequency map, O(k) for the heap during topK.