Design Spreadsheet with Formulas
Asked at:
OpenAI
DESCRIPTION
Design a spreadsheet API where cells can contain either raw values or formulas that reference other cells (e.g., =A1+B2). The system must automatically propagate updates through dependent cells when a value changes, maintain a dependency graph to track cell relationships, detect circular references to prevent infinite loops, and use memoization to cache computed values for efficient recalculation. For example, if cell C1 contains =A1+B1, updating A1 should automatically trigger recalculation of C1.
Input:
Set A1=5, B1=10, C1="=A1+B1" Get C1
Output:
15
Explanation: C1 evaluates the formula by fetching values from A1 and B1
Constraints:
- Cell names are alphanumeric strings (e.g., A1, B2, Total)
- Formulas start with = and can reference other cells
- Must detect circular dependencies before evaluation
- Recalculation should only affect dependent cells, not the entire sheet
- Support basic arithmetic operations (+, -, *, /)
Understanding the Problem
The core challenge is maintaining bidirectional relationships between cells: each cell must know which cells it depends on (for evaluation) and which cells depend on it (for propagation). When a cell updates, we need to invalidate cached values of all downstream dependents and recalculate them in the correct order. Cycle detection requires tracking the evaluation path to identify when a cell references itself directly or indirectly. The system must balance lazy evaluation (compute only when needed) with eager invalidation (mark stale values immediately).
Building Intuition
A naive approach would recalculate all cells on every update, which is inefficient for large spreadsheets. A better approach uses a directed acyclic graph (DAG) where edges represent dependencies: if C1=A1+B1, then C1 has edges from A1 and B1. When A1 changes, we traverse the graph to find all downstream cells and recalculate only those. For example, updating A1 in a sheet with B1=A1*2 and C1=B1+5 would only recalculate B1 and C1, not unrelated cells.
This pattern appears in reactive systems like spreadsheets, build systems (Make, Bazel), and UI frameworks (React, Vue) where changes propagate through dependencies. Efficient dependency tracking prevents unnecessary recomputation, which is critical for real-time applications where users expect instant feedback. Understanding topological sorting and cycle detection is essential for any system with interdependent components.
Common Pitfalls
Implementation
Cell Storage and Formula Parsing
Implement the core data structures to store cell values, formulas, and cached results. Each cell maintains its raw input (value or formula string), parsed dependencies (list of referenced cells), and computed value cache. When setting a cell, parse the formula to extract cell references using regex or a simple tokenizer. For example, parsing =A1+B2*3 extracts dependencies [A1, B2]. Store both the forward graph (cell → dependencies) and reverse graph (cell → dependents) for efficient traversal.
import reclass Cell:def __init__(self):self.raw_input = Noneself.dependencies = set()self.cached_value = Noneself.is_formula = Falseclass Spreadsheet:def __init__(self):self.cells = {}self.forward_graph = {}self.reverse_graph = {}def parse_formula(self, formula):pattern = r'\b([A-Z]+[0-9]+)\b'return set(re.findall(pattern, formula))def set_cell(self, cell_id, value):if cell_id not in self.cells:self.cells[cell_id] = Cell()cell = self.cells[cell_id]cell.raw_input = valuecell.is_formula = isinstance(value, str) and value.startswith('=')old_deps = self.forward_graph.get(cell_id, set())for dep in old_deps:self.reverse_graph.get(dep, set()).discard(cell_id)if cell.is_formula:cell.dependencies = self.parse_formula(value)self.forward_graph[cell_id] = cell.dependenciesfor dep in cell.dependencies:self.reverse_graph.setdefault(dep, set()).add(cell_id)else:cell.dependencies = set()self.forward_graph[cell_id] = set()cell.cached_value = value
Dependency Graph Management
Build methods to update the dependency graph when cells change. When a cell's formula is set, remove its old outgoing edges from the graph, parse the new formula to find dependencies, and add new edges. Update the reverse graph so each dependency knows which cells depend on it. For example, changing C1 from =A1+B1 to =A1+D1 removes the edge C1→B1 and adds C1→D1, while updating B1's dependents list to remove C1 and D1's to add C1.
def update_dependencies(self, cell_id, new_formula):# Remove old outgoing edgesif cell_id in self.dependencies:for dep in self.dependencies[cell_id]:self.reverse_deps[dep].discard(cell_id)del self.dependencies[cell_id]# Parse new formula for dependenciesif new_formula and new_formula.startswith('='):deps = self.parse_formula(new_formula)self.dependencies[cell_id] = set(deps)# Update reverse graphfor dep in deps:if dep not in self.reverse_deps:self.reverse_deps[dep] = set()self.reverse_deps[dep].add(cell_id)else:self.dependencies[cell_id] = set()
Cycle Detection
Implement depth-first search (DFS) to detect circular references before allowing a formula to be set. Starting from the cell being updated, traverse its dependencies recursively while maintaining a visited set for the current path. If we encounter a cell already in the current path, a cycle exists. For example, setting A1=B1 when B1=C1 and C1=A1 would detect the cycle A1→B1→C1→A1 and reject the operation.
def has_cycle(self, cell_id, new_formula):dependencies = self._parse_dependencies(new_formula)visited = set()def dfs(current):if current in visited:return Trueif current not in self.dependencies:return Falsevisited.add(current)for dep in self.dependencies[current]:if dfs(dep):return Truevisited.remove(current)return Falsefor dep in dependencies:if dfs(dep):return Truereturn False
Evaluation and Memoization
Implement lazy evaluation with caching: compute a cell's value only when requested, and cache the result until dependencies change. When evaluating a formula, recursively get values of dependencies (which may trigger their evaluation), then apply the formula operations. Mark cached values as stale when a cell updates by traversing the reverse graph to find all dependents. For example, updating A1 marks B1=A1*2 and C1=B1+5 as stale, but they're only recomputed when accessed.
def get_value(self, cell_id):if cell_id not in self.cache or self.stale[cell_id]:cell = self.cells[cell_id]if cell['type'] == 'value':self.cache[cell_id] = cell['data']else: # formuladeps = self.dependencies[cell_id]dep_values = {dep: self.get_value(dep) for dep in deps}self.cache[cell_id] = self.evaluate_formula(cell['data'], dep_values)self.stale[cell_id] = Falsereturn self.cache[cell_id]def set_value(self, cell_id, value):self.cells[cell_id] = {'type': 'value', 'data': value}self.mark_stale(cell_id)def mark_stale(self, cell_id):for dependent in self.reverse_deps.get(cell_id, []):if not self.stale[dependent]:self.stale[dependent] = Trueself.mark_stale(dependent)
Update Propagation
Implement the public API methods for setting and getting cell values. set(cell, value) updates the cell, invalidates dependent caches, and performs cycle detection if the value is a formula. get(cell) returns the cached value if valid, otherwise evaluates the cell and caches the result. For example, set('A1', 5) followed by get('C1') where C1=A1+10 would evaluate C1 to 15 and cache it for future gets.
def set(self, cell, value):if isinstance(value, str) and value.startswith('='):deps = self._parse_formula(value)if self._has_cycle(cell, deps):raise ValueError(f"Circular reference detected for {cell}")self.formulas[cell] = valueself.dependencies[cell] = depselse:self.values[cell] = valueself.formulas.pop(cell, None)self.dependencies.pop(cell, None)self._invalidate_dependents(cell)def get(self, cell):if cell in self.cache and self.cache_valid.get(cell, False):return self.cache[cell]result = self._evaluate(cell)self.cache[cell] = resultself.cache_valid[cell] = Truereturn result
What We've Learned
- Pattern: Use a bidirectional dependency graph (forward for evaluation, reverse for invalidation) to efficiently track and propagate changes
- Use Case: This reactive dependency pattern applies to build systems, UI frameworks, and any system where changes cascade through relationships
- Optimization: Lazy evaluation with memoization avoids unnecessary computation by only recalculating cells when their values are actually needed
- Safety: Always perform cycle detection before adding dependencies to prevent infinite loops from being created in the first place
Problems to Practice
medium
This problem requires building a dependency graph and detecting cycles to determine if courses can be completed, which directly parallels the spreadsheet's need for dependency tracking and cycle detection to prevent circular formula references.
medium
Building on Course Schedule, this problem requires topological sorting to determine the order of course completion, similar to how a spreadsheet must determine the correct order to evaluate formulas based on their dependencies.
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Mid December, 2024
OpenAI
Senior
Design a spreadsheet API where cells can contain values or formulas that reference other cells, automatically handling updates when dependencies change. Implement dependency graph tracking to manage cell relationships, cycle detection to prevent infinite loops from circular references, and memoization to cache computed values for efficient recalculation.
Comments
Hello Interview Premium
Your account is free and you can post anonymously if you choose.