Limited Time Offer:Up to 25% off Hello Interview Premium

⌘K

Tutor

Get Premium

Leetcode 642. Design Search Autocomplete System

Asked at:

Google

DESCRIPTION

Design an autocomplete system that processes user input character-by-character and returns the top 3 most frequent sentences matching the current prefix. Sentences are ranked by frequency (descending), with lexicographical order as a tiebreaker. When the user enters '#', the current sentence is saved with its frequency incremented. For example, if the system has seen "hello world" 5 times and "hello interview" 3 times, typing 'h' then 'e' then 'l' should return ["hello world", "hello interview"] after each character.

Input:

sentences = ["hello world", "hello interview", "hello"]
frequencies = [5, 3, 2]
user types: 'h', 'e', 'l', 'l', 'o', '#'

Output:

After 'h': ["hello world", "hello interview", "hello"]
After 'e': ["hello world", "hello interview", "hello"]
After 'l': ["hello world", "hello interview", "hello"]
After 'l': ["hello world", "hello interview", "hello"]
After 'o': ["hello world", "hello interview", "hello"]
After '#': [] (sentence "hello" saved with frequency 3)

Explanation: Each character extends the prefix. The '#' character completes input and increments frequency of "hello" from 2 to 3.

Constraints:

Sentences contain only lowercase letters, spaces, and may be up to 100 characters
Initial frequencies are positive integers
Must return at most 3 results per query
System must handle incremental character input efficiently

Understanding the Problem

The core challenge is efficiently retrieving top-k sentences for any prefix while supporting dynamic frequency updates. A naive approach of scanning all sentences per character is too slow. We need a prefix-indexed structure (like a trie) where each node stores candidate sentences, combined with a mechanism to maintain top-3 rankings at query time. The '#' character triggers an update that increments frequency and may reorder rankings for all prefixes of that sentence.

Building Intuition

A naive approach would store all sentences in a list and filter by prefix on each character, then sort by frequency—this is O(N log N) per query. A better approach uses a trie where each node maps characters to child nodes, and we store sentence candidates at each node. When querying, we traverse the trie following the prefix, then retrieve and rank the top 3 sentences from that node's candidates. For example, typing 'h' → 'e' → 'l' navigates to the node representing prefix "hel", which stores all sentences starting with "hel".

Autocomplete systems power search engines, IDEs, and messaging apps where millisecond response times are critical. Efficient prefix queries enable real-time suggestions as users type, improving user experience. The frequency-based ranking ensures personalized, relevant results based on historical usage patterns.

Common Pitfalls

Storing full sorted lists at each trie node: Wastes memory and makes updates slow—instead, store sentence references and sort only the top-k on demand

Forgetting lexicographical tiebreaker: When frequencies match, must compare sentences alphabetically to ensure deterministic ordering

Not resetting prefix state on '#': The current input buffer must clear after saving a sentence to start fresh for the next query

Implementation

Trie Structure and Sentence Storage

Build a trie (prefix tree) where each TrieNode has a children map (character → node) and a sentences set storing all complete sentences passing through this prefix. Maintain a global frequency map tracking counts for each sentence. When initializing, insert all historical sentences into the trie by traversing character-by-character and adding the sentence to every node along the path. For example, inserting "hello" adds it to nodes for 'h', 'he', 'hel', 'hell', and 'hello'.

Solution

class TrieNode:
    def __init__(self):
        self.children = {}
        self.sentences = set()
class AutocompleteSystem:
    def __init__(self, sentences, times):
        self.root = TrieNode()
        self.frequency = {}
        for sentence, count in zip(sentences, times):
            self.frequency[sentence] = count
            self._insert(sentence)
    
    def _insert(self, sentence):
        node = self.root
        for char in sentence:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.sentences.add(sentence)

Prefix Query and Top-K Retrieval

Implement the input(c) method that handles character-by-character queries. Maintain a current_prefix buffer that accumulates characters. For non-'#' characters, append to the buffer and traverse the trie to the corresponding node. Retrieve all sentences from that node's sentences set, sort them by (-frequency, sentence) to get descending frequency with lexicographical tiebreaking, and return the top 3. For example, at prefix "hel", if the node contains {"hello world": 5, "hello": 3}, return ["hello world", "hello"].

Solution

    def input(self, c: str):
        if not hasattr(self, 'current_prefix'):
            self.current_prefix = ""
        
        if c == '#':
            if self.current_prefix:
                self.frequency[self.current_prefix] = self.frequency.get(self.current_prefix, 0) + 1
                self._insert(self.current_prefix)
            self.current_prefix = ""
            return []
        
        self.current_prefix += c
        node = self.root
        for char in self.current_prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        
        candidates = list(node.sentences)
        candidates.sort(key=lambda s: (-self.frequency[s], s))
        return candidates[:3]

Sentence Insertion on '#' Character

When input('#') is called, take the accumulated current_prefix as a complete sentence. Increment its frequency in the global frequency map (initialize to 0 if new). Insert the sentence into the trie by traversing each character and adding it to every node's sentences set along the path—this ensures future prefix queries include this sentence. Finally, clear current_prefix to reset for the next input sequence. For example, if the user typed "hi" then '#', increment frequency["hi"] and add "hi" to nodes 'h' and 'hi'.

Solution

# This section is already implemented in Section 2
# The '#' handling logic is integrated in the input() method:
# - Increments frequency for current_prefix
# - Calls _insert() to add sentence to trie
# - Clears current_prefix
# No additional code needed - the implementation in Section 2
# already handles sentence insertion on '#' character

Complete Integration

Combine the trie structure (Section 1), query logic (Section 2), and insertion logic (Section 3) into a unified AutocompleteSystem class. The constructor initializes the trie with historical sentences and frequencies. The input(c) method routes to either the query path (returning top-3 results) or the insertion path (saving the sentence on '#'). This design ensures O(P + M log M) query time where P is prefix length and M is matching sentences, and O(L) insertion time where L is sentence length.

Solution

# Complete Integration - All components unified in AutocompleteSystem
# The class already integrates:
# - Trie structure (Section 1): TrieNode with children and sentences
# - Query logic (Section 2): input() method with prefix traversal and top-3 sorting
# - Insertion logic (Section 3): '#' handling in input() method
# Example usage:
# system = AutocompleteSystem(["hello interview", "hello world"], [5, 3])
# system.input('h')  # Returns top 3 sentences starting with 'h'
# system.input('e')  # Returns top 3 sentences starting with 'he'
# system.input('l')  # Returns top 3 sentences starting with 'hel'
# system.input('#')  # Saves "hel" with frequency 1, returns []

What We've Learned

Pattern: Use a trie with sentence sets at each node to enable efficient prefix-based retrieval without scanning all data
Ranking: Sort by (-frequency, sentence) tuple to achieve descending frequency with lexicographical tiebreaking in one operation
State Management: Maintain a current prefix buffer that accumulates characters and resets on `'#'` to separate query sessions
Use Case: Powers real-time autocomplete in search engines, code editors, and messaging apps where sub-100ms response is critical

Problems to Practice

Overview

Lesson

Trie

This lesson introduces the Trie data structure, which is the fundamental data structure needed to implement an efficient autocomplete system. Understanding Trie basics is essential for solving prefix-based search problems like the autocomplete system.

Learn

Implement Trie Methods

medium

Trie

This problem teaches how to build and implement core Trie operations (insert, search, startsWith), which are the exact building blocks needed for an autocomplete system that performs prefix matching and sentence storage.

Practice

Prefix Matching

medium

Trie

This problem directly practices prefix-based queries using a Trie structure, which is the core operation in an autocomplete system where you need to find all sentences matching a given prefix and return the top results.

Practice

Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Company

Level

Region

Early November, 2025

Google

Senior

Mid September, 2025

Google

Mid-level

Comments

Your account is free and you can post anonymously if you choose.

Hello Interview Premium

Recent interview questions

System Design Guided Practice

Exclusive content

Learn More

Leetcode 642. Design Search Autocomplete System

DESCRIPTION

Understanding the Problem

Building Intuition

Common Pitfalls

Implementation

Trie Structure and Sentence Storage

Prefix Query and Top-K Retrieval

Sentence Insertion on '#' Character

Complete Integration

What We've Learned

Problems to Practice

Question Timeline

Comments

Questions

Learn

Links

Legal

Contact