Limited Time Offer:Up to 25% off Hello Interview Premium

Your Dashboard

Interview Coaching

Mock Interviews

1:1 Mentorship

Salary Negotiation

Practice

Learn

System Design

ML System Design

Low Level Design

Code

Behavioral

Salary Negotiation

Interview Guides

Blog

Community

Interview Questions

Interview Experiences

Peer System Design Library

Ask The Community

Discord

Refer a Friend

Pricing

Become a Coach

⌘K

Pricing Become a Coach

⌘K

Tutor

Get Premium

Resumable Iterator

Asked at:

OpenAI

DESCRIPTION

Design an iterator system that supports pause and resume functionality through get_state() and set_state() methods, enabling asynchronous iteration across multiple data sources concurrently. The iterator must maintain execution state for each data source independently while supporting non-blocking coroutine-based operations. For example, when iterating over three files simultaneously, pausing at file 1's line 5, file 2's line 3, and file 3's line 7, the state should be serializable and restorable to resume from those exact positions.

Input:

files = ['data1.txt', 'data2.txt']
iterator = AsyncCompositeIterator(files)
await iterator.next()  # Returns first item from data1.txt
state = iterator.get_state()  # Capture current position
await iterator.next()  # Continue iteration
iterator.set_state(state)  # Restore to previous position

Output:

Iterator resumes from the exact position where state was captured, returning the same item that would have been next at that point

Explanation: The get_state() captures positions across all files, and set_state() restores them, enabling pause/resume functionality

Constraints:

Must support asynchronous iteration using async/await syntax
State must be serializable (can be converted to/from dictionary or JSON)
Must handle multiple concurrent iterators without interference
Each data source maintains independent state within the composite iterator
Must support non-blocking I/O operations for file reading

Understanding the Problem

The core challenge is managing multiple independent iterator states within a composite structure while supporting asynchronous operations. Each underlying iterator (per file) needs its own position tracking, and the composite must coordinate these states without blocking. The get_state() method must capture all sub-iterator positions atomically, while set_state() must restore them accurately. Asynchronous iteration adds complexity since coroutines must be properly managed and resumed from saved states.

Building Intuition

A naive approach would use synchronous iteration with simple position counters, but this blocks on I/O and can't handle concurrent operations. A better approach uses async generators for each file iterator, storing their internal state (file position, buffer state) in a dictionary structure that can be serialized. For example, if iterating three files where file1 is at byte 150, file2 at byte 200, and file3 at byte 75, the state dictionary captures {'file1': {'position': 150}, 'file2': {'position': 200}, 'file3': {'position': 75}}.

This pattern is essential for long-running data processing pipelines where jobs may need to pause due to resource constraints or failures. Being able to checkpoint and resume iteration prevents reprocessing large datasets from scratch. The async nature enables efficient resource utilization when processing multiple data sources simultaneously.

Common Pitfalls

Pitfall: Storing iterator objects directly in state (not serializable). Fix: Store only position/offset data that can reconstruct the iterator.

Pitfall: Not handling file position vs. logical item position separately. Fix: Track both byte offset and item count for accurate restoration.

Pitfall: Sharing state between concurrent iterators causing race conditions. Fix: Each iterator instance maintains its own isolated state copy.

Pitfall: Forgetting to await async operations, causing runtime errors. Fix: Ensure all I/O operations use await and methods are marked async.

Implementation

Async File Iterator with State Management

Implement a single-file async iterator that tracks its position and supports state serialization. The iterator uses aiofiles for non-blocking file I/O and maintains both byte offset and line number in its state. The get_state() method returns a dictionary with current position, while set_state() seeks to the stored position. For example, after reading 5 lines, get_state() returns {'file': 'data.txt', 'position': 342, 'line': 5}, allowing restoration to that exact point.

Solution

import aiofiles
class AsyncFileIterator:
    def __init__(self, filepath):
        self.filepath = filepath
        self.position = 0
        self.line_number = 0
        self.file = None
    
    async def __aiter__(self):
        self.file = await aiofiles.open(self.filepath, 'r')
        await self.file.seek(self.position)
        return self
    
    async def __anext__(self):
        line = await self.file.readline()
        if not line:
            raise StopAsyncIteration
        self.position = await self.file.tell()
        self.line_number += 1
        return line
    
    def get_state(self):
        return {'file': self.filepath, 'position': self.position, 'line': self.line_number}
    
    async def set_state(self, state):
        self.position = state['position']
        self.line_number = state['line']
        if self.file:
            await self.file.seek(self.position)

Composite Iterator Coordination

Build a composite iterator that manages multiple async file iterators concurrently using asyncio.gather() or task scheduling. Each sub-iterator is stored in a dictionary keyed by filename, and the composite's get_state() aggregates all sub-states into a single structure. The __anext__() method uses round-robin or concurrent fetching to yield items from multiple sources. For instance, with three files, the state becomes {'iterators': {'file1': {...}, 'file2': {...}, 'file3': {...}}, 'current_index': 1}.

Solution

import asyncio
class CompositeAsyncIterator:
    def __init__(self, file_iterators):
        self.iterators = {name: it for name, it in file_iterators.items()}
        self.current_index = 0
        self.file_names = list(self.iterators.keys())
    
    def get_state(self):
        return {
            'iterators': {name: it.get_state() for name, it in self.iterators.items()},
            'current_index': self.current_index
        }
    
    def set_state(self, state):
        for name, it_state in state['iterators'].items():
            self.iterators[name].set_state(it_state)
        self.current_index = state['current_index']
    
    async def __anext__(self):
        if not self.file_names:
            raise StopAsyncIteration
        attempts = 0
        while attempts < len(self.file_names):
            name = self.file_names[self.current_index]
            self.current_index = (self.current_index + 1) % len(self.file_names)
            try:
                return await self.iterators[name].__anext__()
            except StopAsyncIteration:
                attempts += 1
        raise StopAsyncIteration

What We've Learned

Pattern: Use async generators with position tracking for pausable iteration over I/O-bound sources
State Design: Store serializable metadata (positions, offsets) rather than runtime objects (file handles, generators)
Concurrency: Leverage `asyncio` to coordinate multiple iterators without blocking, using task-based scheduling
Use Case: Essential for checkpointing in data pipelines, ETL processes, and distributed processing systems

Problems to Practice

Overview

Lesson

Stack

Iterators often use stack-based state management to track execution position and nested contexts. Understanding stack fundamentals is crucial for implementing resumable iterators that need to maintain and restore their execution state across pause/resume cycles.

Learn

Decode String

medium

Stack

This problem requires managing nested state and resuming operations at different levels, similar to how a resumable iterator must track and restore its position through nested data structures. The stack-based approach to handling nested contexts directly parallels iterator state management.

Practice

Merge K Sorted Lists

hard

Heap

This problem involves managing multiple iterators concurrently (one for each sorted list) and coordinating their states, which directly relates to the composite iterator pattern mentioned in the problem description. It demonstrates how to handle asynchronous-like operations across multiple data sources while maintaining individual progress states.

Practice

Question Timeline

See when this question was last asked and where, including any notes left by other candidates.

Company

Level

Region

Early October, 2025

OpenAI

Mid-level

Mid December, 2024

OpenAI

Senior

Design an iterator that can pause and resume its execution state through get_state and set_state methods, extending support to handle multiple files and asynchronous operations. The implementation involves managing composite iterators that can run concurrently while maintaining their individual states. Handle asynchronous iteration using coroutines to enable non-blocking operations across multiple data sources.

Comments

Your account is free and you can post anonymously if you choose.

Hello Interview Premium

Recent interview questions

System Design Guided Practice

Exclusive content

Learn More

Resumable Iterator

DESCRIPTION

Understanding the Problem

Building Intuition

Common Pitfalls

Implementation

Async File Iterator with State Management

Composite Iterator Coordination

What We've Learned

Problems to Practice

Question Timeline

Comments

Questions

Learn

Links

Legal

Contact