Limited Time Offer:Up to 25% off Hello Interview Premium

Your Dashboard

Interview Coaching

Mock Interviews

1:1 Mentorship

Salary Negotiation

Practice

Learn

System Design

ML System Design

Low Level Design

Code

Behavioral

Salary Negotiation

Interview Guides

Blog

Community

Interview Questions

Interview Experiences

Peer System Design Library

Ask The Community

Discord

Refer a Friend

Pricing

Become a Coach

⌘K

Pricing Become a Coach

⌘K

Tutor

Get Premium

In-Memory Database: Implement SQL-Like Operations

Asked at:

OpenAI

DESCRIPTION

Build an in-memory database that supports SQL-like operations including INSERT, SELECT with WHERE clauses, and ORDER BY functionality. Implement efficient data storage structures, a query parser to evaluate conditions, and inverted indexes to optimize search performance. For example, after inserting records like {id: 1, name: 'Alice', age: 30}, you should be able to query SELECT * WHERE age > 25 ORDER BY name and retrieve matching records efficiently.

Input:

INSERT {id: 1, name: 'Alice', age: 30}
INSERT {id: 2, name: 'Bob', age: 25}
SELECT * WHERE age > 25 ORDER BY name ASC

Output:

[{id: 1, name: 'Alice', age: 30}]

Explanation: Only Alice matches the condition age > 25, returned in ascending order by name

Constraints:

Support INSERT operations with arbitrary key-value pairs
Support SELECT with multiple WHERE conditions (AND logic)
Support ORDER BY with ASC/DESC sorting on any field
Implement inverted indexes for efficient field-based lookups
Handle comparison operators: =, !=, <, >, <=, >=
Records can have different schemas (flexible fields)

Understanding the Problem

The challenge involves three interconnected components: storage, querying, and optimization. You need a flexible data structure to store records with varying schemas, a parser to interpret SQL-like query syntax and evaluate conditions, and inverted indexes to avoid full table scans. The complexity lies in efficiently filtering records based on multiple conditions while maintaining sorted results. Balancing memory usage with query performance requires careful index design.

Building Intuition

A naive approach would store all records in a list and scan every record for each query, checking conditions one by one—this becomes O(n) per query. A better approach uses inverted indexes that map field values to record IDs, allowing direct lookup of matching records in O(1) for equality checks. For example, an index on age would map {30: [record1], 25: [record2]}, letting you instantly find all records with age = 30 without scanning.

Real-world databases handle millions of records where full table scans are prohibitively expensive. Inverted indexes are the foundation of search engines and database query optimizers, enabling sub-second response times. Understanding this pattern teaches you how to trade memory for speed—a critical skill in system design.

Common Pitfalls

Pitfall: Re-parsing query strings on every execution → Fix: Parse once and cache the query plan

Pitfall: Building indexes for every field upfront → Fix: Create indexes lazily or only for frequently queried fields

Pitfall: Not handling missing fields in records → Fix: Treat missing fields as null and handle gracefully in comparisons

Pitfall: Sorting entire dataset before filtering → Fix: Filter first to reduce the dataset, then sort

Implementation

Data Storage Layer with Inverted Indexes

Implement the core storage engine using a list to hold records and inverted indexes (hash maps) to map field values to record IDs. Each index tracks which records contain specific values for a field, enabling O(1) lookups. For example, inserting {id: 1, age: 30} updates the age index to map 30 → [1]. This section focuses solely on storage and index maintenance during insertions.

Solution

class Database:
    def __init__(self):
        self.records = []
        self.inverted_indexes = {}
    
    def insert(self, record):
        record_id = len(self.records)
        self.records.append(record)
        for field, value in record.items():
            if field not in self.inverted_indexes:
                self.inverted_indexes[field] = {}
            if value not in self.inverted_indexes[field]:
                self.inverted_indexes[field][value] = []
            self.inverted_indexes[field][value].append(record_id)

Query Parser and Condition Evaluator

Build a query parser that extracts WHERE conditions and ORDER BY clauses from query strings, then implements a condition evaluator to check if records match criteria. Parse conditions into structured objects with field, operator, and value components. For example, age > 25 becomes {field: 'age', operator: '>', value: 25}. This section uses the storage layer from Section 1 but focuses on parsing and evaluation logic.

Solution

class QueryParser:
    def __init__(self):
        self.operators = {'=': lambda a, b: a == b, '>': lambda a, b: a > b, '<': lambda a, b: a < b, '>=': lambda a, b: a >= b, '<=': lambda a, b: a <= b, '!=': lambda a, b: a != b}
    
    def parse_query(self, query):
        conditions, order_by = [], None
        if 'WHERE' in query:
            where_part = query.split('WHERE')[1].split('ORDER BY')[0].strip()
            for cond in where_part.split('AND'):
                cond = cond.strip()
                for op in ['>=', '<=', '!=', '=', '>', '<']:
                    if op in cond:
                        field, value = cond.split(op)
                        conditions.append({'field': field.strip(), 'operator': op, 'value': self._cast_value(value.strip())})
                        break
        if 'ORDER BY' in query:
            order_by = query.split('ORDER BY')[1].strip()
        return conditions, order_by
    
    def _cast_value(self, value):
        if value.isdigit(): return int(value)
        try: return float(value)
        except: return value.strip("'\"")
    
    def evaluate(self, record, conditions):
        return all(self.operators[c['operator']](record.get(c['field']), c['value']) for c in conditions)

Query Executor with Index Optimization

Implement the query execution engine that combines the parser (Section 2) and storage layer (Section 1) to process SELECT queries efficiently. Use inverted indexes to fetch candidate records for equality conditions, then filter with remaining conditions and apply sorting. For example, WHERE age = 30 AND name = 'Alice' first looks up age index, then filters by name. This section orchestrates the complete query flow.

Solution

class QueryExecutor:
    def __init__(self, database, parser):
        self.db = database
        self.parser = parser
    
    def execute(self, query):
        conditions, order_by = self.parser.parse_query(query)
        candidate_ids = self._get_candidates_from_index(conditions)
        results = [self.db.records[rid] for rid in candidate_ids if self.parser.evaluate(self.db.records[rid], conditions)]
        if order_by:
            results.sort(key=lambda r: r.get(order_by, 0))
        return results
    
    def _get_candidates_from_index(self, conditions):
        eq_conditions = [c for c in conditions if c['operator'] == '=']
        if not eq_conditions:
            return range(len(self.db.records))
        field, value = eq_conditions[0]['field'], eq_conditions[0]['value']
        return self.db.inverted_indexes.get(field, {}).get(value, [])

Complete Integration and Usage

Combine Section 1's storage engine, Section 2's parser, and Section 3's executor into a unified Database class with a simple API. Provide methods like insert(record) and query(sql_string) that internally coordinate all components. Show end-to-end examples of inserting multiple records and executing complex queries with multiple conditions and sorting, demonstrating how the inverted indexes accelerate lookups compared to full scans.

Solution

class InMemoryDatabase:
    def __init__(self):
        self.storage = Database()
        self.parser = QueryParser()
        self.executor = QueryExecutor(self.storage, self.parser)
    
    def insert(self, record):
        self.storage.insert(record)
    
    def query(self, sql_string):
        return self.executor.execute(sql_string)
# Usage example
db = InMemoryDatabase()
db.insert({'id': 1, 'name': 'Alice', 'age': 30, 'city': 'NYC'})
db.insert({'id': 2, 'name': 'Bob', 'age': 25, 'city': 'LA'})
db.insert({'id': 3, 'name': 'Charlie', 'age': 35, 'city': 'NYC'})
results = db.query("SELECT * WHERE city = 'NYC' AND age >= 30 ORDER BY age")

What We've Learned

Pattern: Inverted indexes trade memory for query speed by pre-computing field-to-record mappings
Use Case: Essential for search engines, databases, and any system requiring fast filtered lookups
Optimization: Filter with indexed fields first to minimize the candidate set before applying expensive operations
Design: Separate parsing, storage, and execution concerns for maintainable and testable code

Problems to Practice

Implement Trie Methods

medium

Trie

Building a Trie involves implementing data structure methods similar to an in-memory database. Both require efficient data storage structures, insert operations, and query functionality with optimized search performance through specialized indexing.

Practice

Subarray Sum Equals K

medium

Prefix Sum

This problem uses hashmaps to optimize query operations, similar to how inverted indexes work in databases. It demonstrates efficient data retrieval using hash-based lookups and condition evaluation, which are core concepts in database query optimization.

Practice

Overview

Lesson

Trie

This lesson provides foundational understanding of Trie data structures, which share similarities with database indexing strategies. Understanding Tries helps grasp how efficient search structures work, which is essential for implementing inverted indexes in an in-memory database.

Learn

Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Company

Level

Region

Mid November, 2025

OpenAI

Staff

During Screening

Mid October, 2025

OpenAI

Mid-level

Early October, 2025

OpenAI

Senior

Get Premium to View All 6 Reports

Comments

Your account is free and you can post anonymously if you choose.

Hello Interview Premium

Recent interview questions

System Design Guided Practice

Exclusive content

Learn More

In-Memory Database: Implement SQL-Like Operations

DESCRIPTION

Understanding the Problem

Building Intuition

Common Pitfalls

Implementation

Data Storage Layer with Inverted Indexes

Query Parser and Condition Evaluator

Query Executor with Index Optimization

Complete Integration and Usage

What We've Learned

Problems to Practice

Question Timeline

Comments

Questions

Learn

Links

Legal

Contact