Limited Time Offer:Up to 25% off Hello Interview Premium

Your Dashboard

Interview Coaching

Mock Interviews

1:1 Mentorship

Salary Negotiation

Practice

Learn

System Design

ML System Design

Low Level Design

Code

Behavioral

Salary Negotiation

Interview Guides

Blog

Community

Interview Questions

Interview Experiences

Peer System Design Library

Ask The Community

Discord

Refer a Friend

Pricing

Become a Coach

⌘K

Pricing Become a Coach

⌘K

Tutor

Get Premium

Design Spreadsheet with Formulas

Asked at:

OpenAI

DESCRIPTION

Design a spreadsheet API where cells can contain either raw values or formulas that reference other cells (e.g., =A1+B2). The system must automatically propagate updates through dependent cells when a value changes, maintain a dependency graph to track cell relationships, detect circular references to prevent infinite loops, and use memoization to cache computed values for efficient recalculation. For example, if cell C1 contains =A1+B1, updating A1 should automatically trigger recalculation of C1.

Input:

Set A1=5, B1=10, C1="=A1+B1"
Get C1

Output:

Explanation: C1 evaluates the formula by fetching values from A1 and B1

Constraints:

Cell names are alphanumeric strings (e.g., A1, B2, Total)
Formulas start with = and can reference other cells
Must detect circular dependencies before evaluation
Recalculation should only affect dependent cells, not the entire sheet
Support basic arithmetic operations (+, -, *, /)

Understanding the Problem

The core challenge is maintaining bidirectional relationships between cells: each cell must know which cells it depends on (for evaluation) and which cells depend on it (for propagation). When a cell updates, we need to invalidate cached values of all downstream dependents and recalculate them in the correct order. Cycle detection requires tracking the evaluation path to identify when a cell references itself directly or indirectly. The system must balance lazy evaluation (compute only when needed) with eager invalidation (mark stale values immediately).

Building Intuition

A naive approach would recalculate all cells on every update, which is inefficient for large spreadsheets. A better approach uses a directed acyclic graph (DAG) where edges represent dependencies: if C1=A1+B1, then C1 has edges from A1 and B1. When A1 changes, we traverse the graph to find all downstream cells and recalculate only those. For example, updating A1 in a sheet with B1=A1*2 and C1=B1+5 would only recalculate B1 and C1, not unrelated cells.

This pattern appears in reactive systems like spreadsheets, build systems (Make, Bazel), and UI frameworks (React, Vue) where changes propagate through dependencies. Efficient dependency tracking prevents unnecessary recomputation, which is critical for real-time applications where users expect instant feedback. Understanding topological sorting and cycle detection is essential for any system with interdependent components.

Common Pitfalls

Forgetting to invalidate caches: When a cell updates, all dependent cells' cached values must be marked stale, otherwise they return outdated results

Incorrect cycle detection: Must check for cycles during formula parsing, not during evaluation, to prevent infinite loops from ever being created

Wrong recalculation order: Dependencies must be evaluated in topological order (dependencies before dependents) to ensure correct values

Memory leaks in dependency graph: When a cell's formula changes, must remove old dependency edges before adding new ones

Implementation

Cell Storage and Formula Parsing

Implement the core data structures to store cell values, formulas, and cached results. Each cell maintains its raw input (value or formula string), parsed dependencies (list of referenced cells), and computed value cache. When setting a cell, parse the formula to extract cell references using regex or a simple tokenizer. For example, parsing =A1+B2*3 extracts dependencies [A1, B2]. Store both the forward graph (cell → dependencies) and reverse graph (cell → dependents) for efficient traversal.

Solution

import re
class Cell:
    def __init__(self):
        self.raw_input = None
        self.dependencies = set()
        self.cached_value = None
        self.is_formula = False
class Spreadsheet:
    def __init__(self):
        self.cells = {}
        self.forward_graph = {}
        self.reverse_graph = {}
    
    def parse_formula(self, formula):
        pattern = r'\b([A-Z]+[0-9]+)\b'
        return set(re.findall(pattern, formula))
    
    def set_cell(self, cell_id, value):
        if cell_id not in self.cells:
            self.cells[cell_id] = Cell()
        cell = self.cells[cell_id]
        cell.raw_input = value
        cell.is_formula = isinstance(value, str) and value.startswith('=')
        
        old_deps = self.forward_graph.get(cell_id, set())
        for dep in old_deps:
            self.reverse_graph.get(dep, set()).discard(cell_id)
        
        if cell.is_formula:
            cell.dependencies = self.parse_formula(value)
            self.forward_graph[cell_id] = cell.dependencies
            for dep in cell.dependencies:
                self.reverse_graph.setdefault(dep, set()).add(cell_id)
        else:
            cell.dependencies = set()
            self.forward_graph[cell_id] = set()
            cell.cached_value = value

Dependency Graph Management

Build methods to update the dependency graph when cells change. When a cell's formula is set, remove its old outgoing edges from the graph, parse the new formula to find dependencies, and add new edges. Update the reverse graph so each dependency knows which cells depend on it. For example, changing C1 from =A1+B1 to =A1+D1 removes the edge C1→B1 and adds C1→D1, while updating B1's dependents list to remove C1 and D1's to add C1.

Solution

def update_dependencies(self, cell_id, new_formula):
    # Remove old outgoing edges
    if cell_id in self.dependencies:
        for dep in self.dependencies[cell_id]:
            self.reverse_deps[dep].discard(cell_id)
        del self.dependencies[cell_id]
    
    # Parse new formula for dependencies
    if new_formula and new_formula.startswith('='):
        deps = self.parse_formula(new_formula)
        self.dependencies[cell_id] = set(deps)
        
        # Update reverse graph
        for dep in deps:
            if dep not in self.reverse_deps:
                self.reverse_deps[dep] = set()
            self.reverse_deps[dep].add(cell_id)
    else:
        self.dependencies[cell_id] = set()

Cycle Detection

Implement depth-first search (DFS) to detect circular references before allowing a formula to be set. Starting from the cell being updated, traverse its dependencies recursively while maintaining a visited set for the current path. If we encounter a cell already in the current path, a cycle exists. For example, setting A1=B1 when B1=C1 and C1=A1 would detect the cycle A1→B1→C1→A1 and reject the operation.

Solution

def has_cycle(self, cell_id, new_formula):
    dependencies = self._parse_dependencies(new_formula)
    visited = set()
    
    def dfs(current):
        if current in visited:
            return True
        if current not in self.dependencies:
            return False
        
        visited.add(current)
        for dep in self.dependencies[current]:
            if dfs(dep):
                return True
        visited.remove(current)
        return False
    
    for dep in dependencies:
        if dfs(dep):
            return True
    return False

Evaluation and Memoization

Implement lazy evaluation with caching: compute a cell's value only when requested, and cache the result until dependencies change. When evaluating a formula, recursively get values of dependencies (which may trigger their evaluation), then apply the formula operations. Mark cached values as stale when a cell updates by traversing the reverse graph to find all dependents. For example, updating A1 marks B1=A1*2 and C1=B1+5 as stale, but they're only recomputed when accessed.

Solution

def get_value(self, cell_id):
    if cell_id not in self.cache or self.stale[cell_id]:
        cell = self.cells[cell_id]
        if cell['type'] == 'value':
            self.cache[cell_id] = cell['data']
        else:  # formula
            deps = self.dependencies[cell_id]
            dep_values = {dep: self.get_value(dep) for dep in deps}
            self.cache[cell_id] = self.evaluate_formula(cell['data'], dep_values)
        self.stale[cell_id] = False
    return self.cache[cell_id]
def set_value(self, cell_id, value):
    self.cells[cell_id] = {'type': 'value', 'data': value}
    self.mark_stale(cell_id)
def mark_stale(self, cell_id):
    for dependent in self.reverse_deps.get(cell_id, []):
        if not self.stale[dependent]:
            self.stale[dependent] = True
            self.mark_stale(dependent)

Update Propagation

Implement the public API methods for setting and getting cell values. set(cell, value) updates the cell, invalidates dependent caches, and performs cycle detection if the value is a formula. get(cell) returns the cached value if valid, otherwise evaluates the cell and caches the result. For example, set('A1', 5) followed by get('C1') where C1=A1+10 would evaluate C1 to 15 and cache it for future gets.

Solution

def set(self, cell, value):
    if isinstance(value, str) and value.startswith('='):
        deps = self._parse_formula(value)
        if self._has_cycle(cell, deps):
            raise ValueError(f"Circular reference detected for {cell}")
        self.formulas[cell] = value
        self.dependencies[cell] = deps
    else:
        self.values[cell] = value
        self.formulas.pop(cell, None)
        self.dependencies.pop(cell, None)
    self._invalidate_dependents(cell)
def get(self, cell):
    if cell in self.cache and self.cache_valid.get(cell, False):
        return self.cache[cell]
    result = self._evaluate(cell)
    self.cache[cell] = result
    self.cache_valid[cell] = True
    return result

What We've Learned

Pattern: Use a bidirectional dependency graph (forward for evaluation, reverse for invalidation) to efficiently track and propagate changes
Use Case: This reactive dependency pattern applies to build systems, UI frameworks, and any system where changes cascade through relationships
Optimization: Lazy evaluation with memoization avoids unnecessary computation by only recalculating cells when their values are actually needed
Safety: Always perform cycle detection before adding dependencies to prevent infinite loops from being created in the first place

Problems to Practice

Course Schedule

medium

Graphs

This problem requires building a dependency graph and detecting cycles to determine if courses can be completed, which directly parallels the spreadsheet's need for dependency tracking and cycle detection to prevent circular formula references.

Practice

Course Schedule II

medium

Graphs

Building on Course Schedule, this problem requires topological sorting to determine the order of course completion, similar to how a spreadsheet must determine the correct order to evaluate formulas based on their dependencies.

Practice

Overview

Lesson

Graphs

This lesson covers topological sorting fundamentals, which is the core algorithm needed for managing dependency graphs in spreadsheets to ensure formulas are evaluated in the correct order without circular dependencies.

Learn

Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Company

Level

Region

Mid December, 2024

OpenAI

Senior

Design a spreadsheet API where cells can contain values or formulas that reference other cells, automatically handling updates when dependencies change. Implement dependency graph tracking to manage cell relationships, cycle detection to prevent infinite loops from circular references, and memoization to cache computed values for efficient recalculation.

Comments

Your account is free and you can post anonymously if you choose.

Hello Interview Premium

Recent interview questions

System Design Guided Practice

Exclusive content

Learn More

Design Spreadsheet with Formulas

DESCRIPTION

Understanding the Problem

Building Intuition

Common Pitfalls

Implementation

Cell Storage and Formula Parsing

Dependency Graph Management

Cycle Detection

Evaluation and Memoization

Update Propagation

What We've Learned

Problems to Practice

Question Timeline

Comments

Questions

Learn

Links

Legal

Contact