Search
⌘K

Resumable Iterator

Asked at:

OpenAI


DESCRIPTION

Design an iterator system that supports pause and resume functionality through get_state() and set_state() methods, enabling asynchronous iteration across multiple data sources concurrently. The iterator must maintain execution state for each data source independently while supporting non-blocking coroutine-based operations. For example, when iterating over three files simultaneously, pausing at file 1's line 5, file 2's line 3, and file 3's line 7, the state should be serializable and restorable to resume from those exact positions.

Input:

files = ['data1.txt', 'data2.txt']
iterator = AsyncCompositeIterator(files)
await iterator.next()  # Returns first item from data1.txt
state = iterator.get_state()  # Capture current position
await iterator.next()  # Continue iteration
iterator.set_state(state)  # Restore to previous position

Output:

Iterator resumes from the exact position where state was captured, returning the same item that would have been next at that point


Explanation: The get_state() captures positions across all files, and set_state() restores them, enabling pause/resume functionality

Constraints:

  • Must support asynchronous iteration using async/await syntax
  • State must be serializable (can be converted to/from dictionary or JSON)
  • Must handle multiple concurrent iterators without interference
  • Each data source maintains independent state within the composite iterator
  • Must support non-blocking I/O operations for file reading

Understanding the Problem

The core challenge is managing multiple independent iterator states within a composite structure while supporting asynchronous operations. Each underlying iterator (per file) needs its own position tracking, and the composite must coordinate these states without blocking. The get_state() method must capture all sub-iterator positions atomically, while set_state() must restore them accurately. Asynchronous iteration adds complexity since coroutines must be properly managed and resumed from saved states.

Building Intuition

A naive approach would use synchronous iteration with simple position counters, but this blocks on I/O and can't handle concurrent operations. A better approach uses async generators for each file iterator, storing their internal state (file position, buffer state) in a dictionary structure that can be serialized. For example, if iterating three files where file1 is at byte 150, file2 at byte 200, and file3 at byte 75, the state dictionary captures {'file1': {'position': 150}, 'file2': {'position': 200}, 'file3': {'position': 75}}.

This pattern is essential for long-running data processing pipelines where jobs may need to pause due to resource constraints or failures. Being able to checkpoint and resume iteration prevents reprocessing large datasets from scratch. The async nature enables efficient resource utilization when processing multiple data sources simultaneously.

Common Pitfalls

Implementation

Async File Iterator with State Management

Implement a single-file async iterator that tracks its position and supports state serialization. The iterator uses aiofiles for non-blocking file I/O and maintains both byte offset and line number in its state. The get_state() method returns a dictionary with current position, while set_state() seeks to the stored position. For example, after reading 5 lines, get_state() returns {'file': 'data.txt', 'position': 342, 'line': 5}, allowing restoration to that exact point.

Solution
import aiofiles
class AsyncFileIterator:
def __init__(self, filepath):
self.filepath = filepath
self.position = 0
self.line_number = 0
self.file = None
async def __aiter__(self):
self.file = await aiofiles.open(self.filepath, 'r')
await self.file.seek(self.position)
return self
async def __anext__(self):
line = await self.file.readline()
if not line:
raise StopAsyncIteration
self.position = await self.file.tell()
self.line_number += 1
return line
def get_state(self):
return {'file': self.filepath, 'position': self.position, 'line': self.line_number}
async def set_state(self, state):
self.position = state['position']
self.line_number = state['line']
if self.file:
await self.file.seek(self.position)
Composite Iterator Coordination

Build a composite iterator that manages multiple async file iterators concurrently using asyncio.gather() or task scheduling. Each sub-iterator is stored in a dictionary keyed by filename, and the composite's get_state() aggregates all sub-states into a single structure. The __anext__() method uses round-robin or concurrent fetching to yield items from multiple sources. For instance, with three files, the state becomes {'iterators': {'file1': {...}, 'file2': {...}, 'file3': {...}}, 'current_index': 1}.

Solution
import asyncio
class CompositeAsyncIterator:
def __init__(self, file_iterators):
self.iterators = {name: it for name, it in file_iterators.items()}
self.current_index = 0
self.file_names = list(self.iterators.keys())
def get_state(self):
return {
'iterators': {name: it.get_state() for name, it in self.iterators.items()},
'current_index': self.current_index
}
def set_state(self, state):
for name, it_state in state['iterators'].items():
self.iterators[name].set_state(it_state)
self.current_index = state['current_index']
async def __anext__(self):
if not self.file_names:
raise StopAsyncIteration
attempts = 0
while attempts < len(self.file_names):
name = self.file_names[self.current_index]
self.current_index = (self.current_index + 1) % len(self.file_names)
try:
return await self.iterators[name].__anext__()
except StopAsyncIteration:
attempts += 1
raise StopAsyncIteration

What We've Learned

  • Pattern: Use async generators with position tracking for pausable iteration over I/O-bound sources
  • State Design: Store serializable metadata (positions, offsets) rather than runtime objects (file handles, generators)
  • Concurrency: Leverage `asyncio` to coordinate multiple iterators without blocking, using task-based scheduling
  • Use Case: Essential for checkpointing in data pipelines, ETL processes, and distributed processing systems

Problems to Practice

Overview
Lesson
Stack

Iterators often use stack-based state management to track execution position and nested contexts. Understanding stack fundamentals is crucial for implementing resumable iterators that need to maintain and restore their execution state across pause/resume cycles.

Decode String

medium

Stack

This problem requires managing nested state and resuming operations at different levels, similar to how a resumable iterator must track and restore its position through nested data structures. The stack-based approach to handling nested contexts directly parallels iterator state management.

This problem involves managing multiple iterators concurrently (one for each sorted list) and coordinating their states, which directly relates to the composite iterator pattern mentioned in the problem description. It demonstrates how to handle asynchronous-like operations across multiple data sources while maintaining individual progress states.

Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Early October, 2025

OpenAI

Mid-level

Mid December, 2024

OpenAI

Senior

Design an iterator that can pause and resume its execution state through get_state and set_state methods, extending support to handle multiple files and asynchronous operations. The implementation involves managing composite iterators that can run concurrently while maintaining their individual states. Handle asynchronous iteration using coroutines to enable non-blocking operations across multiple data sources.

Comments

Your account is free and you can post anonymously if you choose.