Problem Statement
A trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellcheckers.
Implement the Trie class:
Trie() Initializes the trie object.
void insert(String word) Inserts the string word into the trie.
boolean search(String word) Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
boolean startsWith(String prefix) Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.
Additional information
1 <= word.length, prefix.length <= 2000
word and prefix consist only of lowercase English letters.
- At most
3 * 10^4 calls in total will be made to insert, search, and startsWith.
Example 1:
Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]
Explanation:
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple"); // return True
trie.search("app"); // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app"); // return True
Brute Force (Hash Set / List)
Intuition
The simplest way to store and search for words is to throw them all into a standard Hash Set (or a List). Searching for an exact word is incredibly fast with a Hash Set.
However, the problem arises when we need to find if any word starts with a specific prefix. A Hash Set hashes the entire string, so it cannot help us find partial matches. We would be forced to iterate through every single word in our data structure and check if it begins with the target prefix.
Algorithm
- Initialize: Create an empty
words list (or set).
- insert: Append the
word to self.words.
- search: Check if
word exists in self.words.
- startsWith: Loop through every word in
self.words. Check if word.startswith(prefix). If any match is found, return True.
class Trie:
def __init__(self):
self.words = set()
def insert(self, word: str) -> None:
self.words.add(word)
def search(self, word: str) -> bool:
return word in self.words
def startsWith(self, prefix: str) -> bool:
# Must iterate through all stored words to check prefixes
for word in self.words:
if word.startswith(prefix):
return True
return False
Time & Space Complexity
- Time complexity:
insert: O(L), where L is the length of the word (due to string hashing).
search: O(L) for exact set lookup.
startsWith: O(N * L), where N is the total number of words inserted. We have to check every single word. This will trigger a Time Limit Exceeded (TLE) error on large datasets.
- Space complexity: O(N * L) to store all strings independently without sharing any common prefix memory.
Trie Node Architecture - Optimal
Intuition
We can eliminate the massive O(N * L) prefix search time by building a Prefix Tree. Instead of storing whole strings, we store individual characters as nodes in a tree.
If we insert "apple" and "app", they will share the exact same path for the letters a -> p -> p. To differentiate between a prefix and a complete word, each node will have a boolean flag is_end_of_word.
When we search for a word or a prefix, we simply start at the root node and walk down the tree letter by letter. If we ever look for a child node that doesn't exist, the word is not in the tree.
Algorithm
- Define Node: Create a
TrieNode class. It needs a dictionary children to map characters to the next node, and a boolean is_end_of_word set to False.
- Initialize: Create a
self.root which is an empty TrieNode.
- insert: * Start a pointer
curr at self.root.
- Loop through each character
char in the string.
- If
char is not in curr.children, create a new TrieNode and map it.
- Move the pointer down:
curr = curr.children[char].
- After the loop finishes, mark the final node:
curr.is_end_of_word = True.
- search:
- Start at
self.root.
- Walk down the tree following the characters of the word.
- If a character is missing, return
False.
- If the loop finishes, return the value of
curr.is_end_of_word (it must be a complete word, not just a prefix).
- startsWith:
- Exact same logic as
search, but if the loop finishes successfully without hitting a missing character, immediately return True (we don't care if it's the end of a word or not).
class TrieNode:
def __init__(self):
# Maps a character to another TrieNode
self.children = {}
# Marks if this node represents the final letter of an inserted word
self.is_end_of_word = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
curr = self.root
for char in word:
if char not in curr.children:
curr.children[char] = TrieNode()
curr = curr.children[char]
# Mark the end of the complete word
curr.is_end_of_word = True
def search(self, word: str) -> bool:
curr = self.root
for char in word:
if char not in curr.children:
return False
curr = curr.children[char]
# True if it's a completely inserted word, False if it's only a prefix
return curr.is_end_of_word
def startsWith(self, prefix: str) -> bool:
curr = self.root
for char in prefix:
if char not in curr.children:
return False
curr = curr.children[char]
# We successfully navigated the prefix path
return True
Time & Space Complexity
- Time complexity: O(L) for
insert, search, and startsWith, where L is the length of the string. We do exactly one hash map lookup per character.
- Space complexity: O(N * L) in the absolute worst case where no words share any prefixes. However, in reality, it is incredibly space-efficient because words sharing prefixes share the exact same nodes in memory.