In-Memory Database: Implement SQL-Like Operations
Asked at:
OpenAI
DESCRIPTION
Build an in-memory database that supports SQL-like operations including INSERT, SELECT with WHERE clauses, and ORDER BY functionality. Implement efficient data storage structures, a query parser to evaluate conditions, and inverted indexes to optimize search performance. For example, after inserting records like {id: 1, name: 'Alice', age: 30}, you should be able to query SELECT * WHERE age > 25 ORDER BY name and retrieve matching records efficiently.
Input:
INSERT {id: 1, name: 'Alice', age: 30} INSERT {id: 2, name: 'Bob', age: 25} SELECT * WHERE age > 25 ORDER BY name ASC
Output:
[{id: 1, name: 'Alice', age: 30}]
Explanation: Only Alice matches the condition age > 25, returned in ascending order by name
Constraints:
- Support INSERT operations with arbitrary key-value pairs
- Support SELECT with multiple WHERE conditions (AND logic)
- Support ORDER BY with ASC/DESC sorting on any field
- Implement inverted indexes for efficient field-based lookups
- Handle comparison operators: =, !=, <, >, <=, >=
- Records can have different schemas (flexible fields)
Understanding the Problem
The challenge involves three interconnected components: storage, querying, and optimization. You need a flexible data structure to store records with varying schemas, a parser to interpret SQL-like query syntax and evaluate conditions, and inverted indexes to avoid full table scans. The complexity lies in efficiently filtering records based on multiple conditions while maintaining sorted results. Balancing memory usage with query performance requires careful index design.
Building Intuition
A naive approach would store all records in a list and scan every record for each query, checking conditions one by one—this becomes O(n) per query. A better approach uses inverted indexes that map field values to record IDs, allowing direct lookup of matching records in O(1) for equality checks. For example, an index on age would map {30: [record1], 25: [record2]}, letting you instantly find all records with age = 30 without scanning.
Real-world databases handle millions of records where full table scans are prohibitively expensive. Inverted indexes are the foundation of search engines and database query optimizers, enabling sub-second response times. Understanding this pattern teaches you how to trade memory for speed—a critical skill in system design.
Common Pitfalls
Implementation
Data Storage Layer with Inverted Indexes
Implement the core storage engine using a list to hold records and inverted indexes (hash maps) to map field values to record IDs. Each index tracks which records contain specific values for a field, enabling O(1) lookups. For example, inserting {id: 1, age: 30} updates the age index to map 30 → [1]. This section focuses solely on storage and index maintenance during insertions.
class Database:def __init__(self):self.records = []self.inverted_indexes = {}def insert(self, record):record_id = len(self.records)self.records.append(record)for field, value in record.items():if field not in self.inverted_indexes:self.inverted_indexes[field] = {}if value not in self.inverted_indexes[field]:self.inverted_indexes[field][value] = []self.inverted_indexes[field][value].append(record_id)
Query Parser and Condition Evaluator
Build a query parser that extracts WHERE conditions and ORDER BY clauses from query strings, then implements a condition evaluator to check if records match criteria. Parse conditions into structured objects with field, operator, and value components. For example, age > 25 becomes {field: 'age', operator: '>', value: 25}. This section uses the storage layer from Section 1 but focuses on parsing and evaluation logic.
class QueryParser:def __init__(self):self.operators = {'=': lambda a, b: a == b, '>': lambda a, b: a > b, '<': lambda a, b: a < b, '>=': lambda a, b: a >= b, '<=': lambda a, b: a <= b, '!=': lambda a, b: a != b}def parse_query(self, query):conditions, order_by = [], Noneif 'WHERE' in query:where_part = query.split('WHERE')[1].split('ORDER BY')[0].strip()for cond in where_part.split('AND'):cond = cond.strip()for op in ['>=', '<=', '!=', '=', '>', '<']:if op in cond:field, value = cond.split(op)conditions.append({'field': field.strip(), 'operator': op, 'value': self._cast_value(value.strip())})breakif 'ORDER BY' in query:order_by = query.split('ORDER BY')[1].strip()return conditions, order_bydef _cast_value(self, value):if value.isdigit(): return int(value)try: return float(value)except: return value.strip("'\"")def evaluate(self, record, conditions):return all(self.operators[c['operator']](record.get(c['field']), c['value']) for c in conditions)
Query Executor with Index Optimization
Implement the query execution engine that combines the parser (Section 2) and storage layer (Section 1) to process SELECT queries efficiently. Use inverted indexes to fetch candidate records for equality conditions, then filter with remaining conditions and apply sorting. For example, WHERE age = 30 AND name = 'Alice' first looks up age index, then filters by name. This section orchestrates the complete query flow.
class QueryExecutor:def __init__(self, database, parser):self.db = databaseself.parser = parserdef execute(self, query):conditions, order_by = self.parser.parse_query(query)candidate_ids = self._get_candidates_from_index(conditions)results = [self.db.records[rid] for rid in candidate_ids if self.parser.evaluate(self.db.records[rid], conditions)]if order_by:results.sort(key=lambda r: r.get(order_by, 0))return resultsdef _get_candidates_from_index(self, conditions):eq_conditions = [c for c in conditions if c['operator'] == '=']if not eq_conditions:return range(len(self.db.records))field, value = eq_conditions[0]['field'], eq_conditions[0]['value']return self.db.inverted_indexes.get(field, {}).get(value, [])
Complete Integration and Usage
Combine Section 1's storage engine, Section 2's parser, and Section 3's executor into a unified Database class with a simple API. Provide methods like insert(record) and query(sql_string) that internally coordinate all components. Show end-to-end examples of inserting multiple records and executing complex queries with multiple conditions and sorting, demonstrating how the inverted indexes accelerate lookups compared to full scans.
class InMemoryDatabase:def __init__(self):self.storage = Database()self.parser = QueryParser()self.executor = QueryExecutor(self.storage, self.parser)def insert(self, record):self.storage.insert(record)def query(self, sql_string):return self.executor.execute(sql_string)# Usage exampledb = InMemoryDatabase()db.insert({'id': 1, 'name': 'Alice', 'age': 30, 'city': 'NYC'})db.insert({'id': 2, 'name': 'Bob', 'age': 25, 'city': 'LA'})db.insert({'id': 3, 'name': 'Charlie', 'age': 35, 'city': 'NYC'})results = db.query("SELECT * WHERE city = 'NYC' AND age >= 30 ORDER BY age")
What We've Learned
- Pattern: Inverted indexes trade memory for query speed by pre-computing field-to-record mappings
- Use Case: Essential for search engines, databases, and any system requiring fast filtered lookups
- Optimization: Filter with indexed fields first to minimize the candidate set before applying expensive operations
- Design: Separate parsing, storage, and execution concerns for maintainable and testable code
Problems to Practice
medium
Building a Trie involves implementing data structure methods similar to an in-memory database. Both require efficient data storage structures, insert operations, and query functionality with optimized search performance through specialized indexing.
medium
This problem uses hashmaps to optimize query operations, similar to how inverted indexes work in databases. It demonstrates efficient data retrieval using hash-based lookups and condition evaluation, which are core concepts in database query optimization.
This lesson provides foundational understanding of Trie data structures, which share similarities with database indexing strategies. Understanding Tries helps grasp how efficient search structures work, which is essential for implementing inverted indexes in an in-memory database.
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Mid November, 2025
OpenAI
Staff
During Screening
Mid October, 2025
OpenAI
Mid-level
Early October, 2025
OpenAI
Senior
Comments
Hello Interview Premium
Your account is free and you can post anonymously if you choose.