Search
⌘K

Design Spreadsheet with Formulas

Asked at:

OpenAI


DESCRIPTION

Design a spreadsheet API where cells can contain either raw values or formulas that reference other cells (e.g., =A1+B2). The system must automatically propagate updates through dependent cells when a value changes, maintain a dependency graph to track cell relationships, detect circular references to prevent infinite loops, and use memoization to cache computed values for efficient recalculation. For example, if cell C1 contains =A1+B1, updating A1 should automatically trigger recalculation of C1.

Input:

Set A1=5, B1=10, C1="=A1+B1"
Get C1

Output:

15


Explanation: C1 evaluates the formula by fetching values from A1 and B1

Constraints:

  • Cell names are alphanumeric strings (e.g., A1, B2, Total)
  • Formulas start with = and can reference other cells
  • Must detect circular dependencies before evaluation
  • Recalculation should only affect dependent cells, not the entire sheet
  • Support basic arithmetic operations (+, -, *, /)

Understanding the Problem

The core challenge is maintaining bidirectional relationships between cells: each cell must know which cells it depends on (for evaluation) and which cells depend on it (for propagation). When a cell updates, we need to invalidate cached values of all downstream dependents and recalculate them in the correct order. Cycle detection requires tracking the evaluation path to identify when a cell references itself directly or indirectly. The system must balance lazy evaluation (compute only when needed) with eager invalidation (mark stale values immediately).

Building Intuition

A naive approach would recalculate all cells on every update, which is inefficient for large spreadsheets. A better approach uses a directed acyclic graph (DAG) where edges represent dependencies: if C1=A1+B1, then C1 has edges from A1 and B1. When A1 changes, we traverse the graph to find all downstream cells and recalculate only those. For example, updating A1 in a sheet with B1=A1*2 and C1=B1+5 would only recalculate B1 and C1, not unrelated cells.

This pattern appears in reactive systems like spreadsheets, build systems (Make, Bazel), and UI frameworks (React, Vue) where changes propagate through dependencies. Efficient dependency tracking prevents unnecessary recomputation, which is critical for real-time applications where users expect instant feedback. Understanding topological sorting and cycle detection is essential for any system with interdependent components.

Common Pitfalls

Implementation

Cell Storage and Formula Parsing

Implement the core data structures to store cell values, formulas, and cached results. Each cell maintains its raw input (value or formula string), parsed dependencies (list of referenced cells), and computed value cache. When setting a cell, parse the formula to extract cell references using regex or a simple tokenizer. For example, parsing =A1+B2*3 extracts dependencies [A1, B2]. Store both the forward graph (cell → dependencies) and reverse graph (cell → dependents) for efficient traversal.

Solution
import re
class Cell:
def __init__(self):
self.raw_input = None
self.dependencies = set()
self.cached_value = None
self.is_formula = False
class Spreadsheet:
def __init__(self):
self.cells = {}
self.forward_graph = {}
self.reverse_graph = {}
def parse_formula(self, formula):
pattern = r'\b([A-Z]+[0-9]+)\b'
return set(re.findall(pattern, formula))
def set_cell(self, cell_id, value):
if cell_id not in self.cells:
self.cells[cell_id] = Cell()
cell = self.cells[cell_id]
cell.raw_input = value
cell.is_formula = isinstance(value, str) and value.startswith('=')
old_deps = self.forward_graph.get(cell_id, set())
for dep in old_deps:
self.reverse_graph.get(dep, set()).discard(cell_id)
if cell.is_formula:
cell.dependencies = self.parse_formula(value)
self.forward_graph[cell_id] = cell.dependencies
for dep in cell.dependencies:
self.reverse_graph.setdefault(dep, set()).add(cell_id)
else:
cell.dependencies = set()
self.forward_graph[cell_id] = set()
cell.cached_value = value
Dependency Graph Management

Build methods to update the dependency graph when cells change. When a cell's formula is set, remove its old outgoing edges from the graph, parse the new formula to find dependencies, and add new edges. Update the reverse graph so each dependency knows which cells depend on it. For example, changing C1 from =A1+B1 to =A1+D1 removes the edge C1→B1 and adds C1→D1, while updating B1's dependents list to remove C1 and D1's to add C1.

Solution
def update_dependencies(self, cell_id, new_formula):
# Remove old outgoing edges
if cell_id in self.dependencies:
for dep in self.dependencies[cell_id]:
self.reverse_deps[dep].discard(cell_id)
del self.dependencies[cell_id]
# Parse new formula for dependencies
if new_formula and new_formula.startswith('='):
deps = self.parse_formula(new_formula)
self.dependencies[cell_id] = set(deps)
# Update reverse graph
for dep in deps:
if dep not in self.reverse_deps:
self.reverse_deps[dep] = set()
self.reverse_deps[dep].add(cell_id)
else:
self.dependencies[cell_id] = set()
Cycle Detection

Implement depth-first search (DFS) to detect circular references before allowing a formula to be set. Starting from the cell being updated, traverse its dependencies recursively while maintaining a visited set for the current path. If we encounter a cell already in the current path, a cycle exists. For example, setting A1=B1 when B1=C1 and C1=A1 would detect the cycle A1→B1→C1→A1 and reject the operation.

Solution
def has_cycle(self, cell_id, new_formula):
dependencies = self._parse_dependencies(new_formula)
visited = set()
def dfs(current):
if current in visited:
return True
if current not in self.dependencies:
return False
visited.add(current)
for dep in self.dependencies[current]:
if dfs(dep):
return True
visited.remove(current)
return False
for dep in dependencies:
if dfs(dep):
return True
return False
Evaluation and Memoization

Implement lazy evaluation with caching: compute a cell's value only when requested, and cache the result until dependencies change. When evaluating a formula, recursively get values of dependencies (which may trigger their evaluation), then apply the formula operations. Mark cached values as stale when a cell updates by traversing the reverse graph to find all dependents. For example, updating A1 marks B1=A1*2 and C1=B1+5 as stale, but they're only recomputed when accessed.

Solution
def get_value(self, cell_id):
if cell_id not in self.cache or self.stale[cell_id]:
cell = self.cells[cell_id]
if cell['type'] == 'value':
self.cache[cell_id] = cell['data']
else: # formula
deps = self.dependencies[cell_id]
dep_values = {dep: self.get_value(dep) for dep in deps}
self.cache[cell_id] = self.evaluate_formula(cell['data'], dep_values)
self.stale[cell_id] = False
return self.cache[cell_id]
def set_value(self, cell_id, value):
self.cells[cell_id] = {'type': 'value', 'data': value}
self.mark_stale(cell_id)
def mark_stale(self, cell_id):
for dependent in self.reverse_deps.get(cell_id, []):
if not self.stale[dependent]:
self.stale[dependent] = True
self.mark_stale(dependent)
Update Propagation

Implement the public API methods for setting and getting cell values. set(cell, value) updates the cell, invalidates dependent caches, and performs cycle detection if the value is a formula. get(cell) returns the cached value if valid, otherwise evaluates the cell and caches the result. For example, set('A1', 5) followed by get('C1') where C1=A1+10 would evaluate C1 to 15 and cache it for future gets.

Solution
def set(self, cell, value):
if isinstance(value, str) and value.startswith('='):
deps = self._parse_formula(value)
if self._has_cycle(cell, deps):
raise ValueError(f"Circular reference detected for {cell}")
self.formulas[cell] = value
self.dependencies[cell] = deps
else:
self.values[cell] = value
self.formulas.pop(cell, None)
self.dependencies.pop(cell, None)
self._invalidate_dependents(cell)
def get(self, cell):
if cell in self.cache and self.cache_valid.get(cell, False):
return self.cache[cell]
result = self._evaluate(cell)
self.cache[cell] = result
self.cache_valid[cell] = True
return result

What We've Learned

  • Pattern: Use a bidirectional dependency graph (forward for evaluation, reverse for invalidation) to efficiently track and propagate changes
  • Use Case: This reactive dependency pattern applies to build systems, UI frameworks, and any system where changes cascade through relationships
  • Optimization: Lazy evaluation with memoization avoids unnecessary computation by only recalculating cells when their values are actually needed
  • Safety: Always perform cycle detection before adding dependencies to prevent infinite loops from being created in the first place

Problems to Practice

Course Schedule

medium

Graphs

This problem requires building a dependency graph and detecting cycles to determine if courses can be completed, which directly parallels the spreadsheet's need for dependency tracking and cycle detection to prevent circular formula references.

Course Schedule II

medium

Graphs

Building on Course Schedule, this problem requires topological sorting to determine the order of course completion, similar to how a spreadsheet must determine the correct order to evaluate formulas based on their dependencies.

Overview
Lesson
Graphs

This lesson covers topological sorting fundamentals, which is the core algorithm needed for managing dependency graphs in spreadsheets to ensure formulas are evaluated in the correct order without circular dependencies.

Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Mid December, 2024

OpenAI

Senior

Design a spreadsheet API where cells can contain values or formulas that reference other cells, automatically handling updates when dependencies change. Implement dependency graph tracking to manage cell relationships, cycle detection to prevent infinite loops from circular references, and memoization to cache computed values for efficient recalculation.

Comments

Your account is free and you can post anonymously if you choose.