Resumable Iterator
Asked at:
OpenAI
DESCRIPTION
Design an iterator system that supports pause and resume functionality through get_state() and set_state() methods, enabling asynchronous iteration across multiple data sources concurrently. The iterator must maintain execution state for each data source independently while supporting non-blocking coroutine-based operations. For example, when iterating over three files simultaneously, pausing at file 1's line 5, file 2's line 3, and file 3's line 7, the state should be serializable and restorable to resume from those exact positions.
Input:
files = ['data1.txt', 'data2.txt'] iterator = AsyncCompositeIterator(files) await iterator.next() # Returns first item from data1.txt state = iterator.get_state() # Capture current position await iterator.next() # Continue iteration iterator.set_state(state) # Restore to previous position
Output:
Iterator resumes from the exact position where state was captured, returning the same item that would have been next at that point
Explanation: The get_state() captures positions across all files, and set_state() restores them, enabling pause/resume functionality
Constraints:
- Must support asynchronous iteration using async/await syntax
- State must be serializable (can be converted to/from dictionary or JSON)
- Must handle multiple concurrent iterators without interference
- Each data source maintains independent state within the composite iterator
- Must support non-blocking I/O operations for file reading
Understanding the Problem
The core challenge is managing multiple independent iterator states within a composite structure while supporting asynchronous operations. Each underlying iterator (per file) needs its own position tracking, and the composite must coordinate these states without blocking. The get_state() method must capture all sub-iterator positions atomically, while set_state() must restore them accurately. Asynchronous iteration adds complexity since coroutines must be properly managed and resumed from saved states.
Building Intuition
A naive approach would use synchronous iteration with simple position counters, but this blocks on I/O and can't handle concurrent operations. A better approach uses async generators for each file iterator, storing their internal state (file position, buffer state) in a dictionary structure that can be serialized. For example, if iterating three files where file1 is at byte 150, file2 at byte 200, and file3 at byte 75, the state dictionary captures {'file1': {'position': 150}, 'file2': {'position': 200}, 'file3': {'position': 75}}.
This pattern is essential for long-running data processing pipelines where jobs may need to pause due to resource constraints or failures. Being able to checkpoint and resume iteration prevents reprocessing large datasets from scratch. The async nature enables efficient resource utilization when processing multiple data sources simultaneously.
Common Pitfalls
Implementation
Async File Iterator with State Management
Implement a single-file async iterator that tracks its position and supports state serialization. The iterator uses aiofiles for non-blocking file I/O and maintains both byte offset and line number in its state. The get_state() method returns a dictionary with current position, while set_state() seeks to the stored position. For example, after reading 5 lines, get_state() returns {'file': 'data.txt', 'position': 342, 'line': 5}, allowing restoration to that exact point.
import aiofilesclass AsyncFileIterator:def __init__(self, filepath):self.filepath = filepathself.position = 0self.line_number = 0self.file = Noneasync def __aiter__(self):self.file = await aiofiles.open(self.filepath, 'r')await self.file.seek(self.position)return selfasync def __anext__(self):line = await self.file.readline()if not line:raise StopAsyncIterationself.position = await self.file.tell()self.line_number += 1return linedef get_state(self):return {'file': self.filepath, 'position': self.position, 'line': self.line_number}async def set_state(self, state):self.position = state['position']self.line_number = state['line']if self.file:await self.file.seek(self.position)
Composite Iterator Coordination
Build a composite iterator that manages multiple async file iterators concurrently using asyncio.gather() or task scheduling. Each sub-iterator is stored in a dictionary keyed by filename, and the composite's get_state() aggregates all sub-states into a single structure. The __anext__() method uses round-robin or concurrent fetching to yield items from multiple sources. For instance, with three files, the state becomes {'iterators': {'file1': {...}, 'file2': {...}, 'file3': {...}}, 'current_index': 1}.
import asyncioclass CompositeAsyncIterator:def __init__(self, file_iterators):self.iterators = {name: it for name, it in file_iterators.items()}self.current_index = 0self.file_names = list(self.iterators.keys())def get_state(self):return {'iterators': {name: it.get_state() for name, it in self.iterators.items()},'current_index': self.current_index}def set_state(self, state):for name, it_state in state['iterators'].items():self.iterators[name].set_state(it_state)self.current_index = state['current_index']async def __anext__(self):if not self.file_names:raise StopAsyncIterationattempts = 0while attempts < len(self.file_names):name = self.file_names[self.current_index]self.current_index = (self.current_index + 1) % len(self.file_names)try:return await self.iterators[name].__anext__()except StopAsyncIteration:attempts += 1raise StopAsyncIteration
What We've Learned
- Pattern: Use async generators with position tracking for pausable iteration over I/O-bound sources
- State Design: Store serializable metadata (positions, offsets) rather than runtime objects (file handles, generators)
- Concurrency: Leverage `asyncio` to coordinate multiple iterators without blocking, using task-based scheduling
- Use Case: Essential for checkpointing in data pipelines, ETL processes, and distributed processing systems
Problems to Practice
medium
This problem requires managing nested state and resuming operations at different levels, similar to how a resumable iterator must track and restore its position through nested data structures. The stack-based approach to handling nested contexts directly parallels iterator state management.
hard
This problem involves managing multiple iterators concurrently (one for each sorted list) and coordinating their states, which directly relates to the composite iterator pattern mentioned in the problem description. It demonstrates how to handle asynchronous-like operations across multiple data sources while maintaining individual progress states.
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Early October, 2025
OpenAI
Mid-level
Mid December, 2024
OpenAI
Senior
Design an iterator that can pause and resume its execution state through get_state and set_state methods, extending support to handle multiple files and asynchronous operations. The implementation involves managing composite iterators that can run concurrently while maintaining their individual states. Handle asynchronous iteration using coroutines to enable non-blocking operations across multiple data sources.
Comments
Hello Interview Premium
Your account is free and you can post anonymously if you choose.