Limited Time Offer:Up to 20% off Hello Interview Premium
Up to 20% off Hello Interview Premium 🎉
Hello Interview
Learn System Design
Introduction
How to Prepare
Delivery Framework
Core Concepts
Key Technologies
Common Patterns
Question Breakdowns
Networking Essentials
API Design
Data Modeling
Caching
Sharding
Consistent Hashing
CAP Theorem
Database Indexing
Numbers to Know
Bitly
Dropbox
Local Delivery Service
Ticketmaster
FB News Feed
Tinder
LeetCode
WhatsApp
Rate Limiter
FB Live Comments
FB Post Search
YouTube Top K
Uber
YouTube
Web Crawler
Ad Click Aggregator
News Aggregator
Yelp
Strava
Online Auction
Price Tracking Service
Instagram
Robinhood
Google Docs
Distributed Cache
Job Scheduler
Payment System
Metrics Monitoring
ChatGPT
Real-time Updates
Dealing with Contention
Multi-step Processes
Scaling Reads
Scaling Writes
Handling Large Blobs
Managing Long Running Tasks
Redis
Elasticsearch
Kafka
API Gateway
Cassandra
DynamoDB
PostgreSQL
Flink
ZooKeeper
Time Series Databases
Data Structures for Big Data
Vector Databases
Vote For New Content
Pricing
Sign in / Sign up
Search
⌘K
Pricing

Tutor

Patterns

Multi-step Processes

Learn about multi-step processes and how to handle them in your system design with distributed transactions, sagas, workflow systems, and durable execution.

Multi-step Processes

⚙️ Real production systems must survive failures, retries, and long-running operations spanning hours or days. Often they take the form of multi-step processes or sagas which involve the coordination of multiple services and systems. This is a continual source of operational and design challenges for engineers, with a variety of different solutions.

The Problem

Many real-world systems end up coordinating dozens or even hundreds of different services and systems in order to complete a user's request, but building reliable multi-step processes in distributed systems is startlingly hard. While clean systems like databases often get to deal with a single "write" or "read", real applications often need to talk to dozens of (flaky) services to do the user's bidding, and doing this quickly and reliably is a common challenge. Jimmy Bogard has a great talk about this titled "Six Little Lines of Fail" with the premise that distributed systems make even a simple sequence of steps like this surprisingly hard (if you haven't had to deal with systems like this before, it's a great watch).
Consider an e-commerce order fulfillment workflow: charge payment, reserve inventory, create shipping label, wait for a human to pick up the item, send confirmation email, and wait for pickup. Each step involves calling different services or waiting on humans, any of which might fail or timeout. Some steps require us to call out to external systems (like a payment gateway) and wait for them to complete. During the orchestration, your server might crash or be deployed to. And maybe you want to make a change to the ordering or nature of steps! The messy complexity of business needs and real-world infrastructure quickly breaks down our otherwise pure flow chart of steps.
Order Fulfillment Nightmare
There are, of course, patches we can organically make to processes like this. We can fortify each service to handle failures, add compensating actions (like refunds if we can't find the inventory) in every place, use delay queues and hooks to handle waits and human tasks, but overall each of these things makes the system more complex and brittle. We interweave system-level concerns (crashes, retries, failures) with business-level concerns (what happens if we can't find the item?). Not a great design!
Workflow systems and durable execution are the solutions to this problem, and they show up in many system design interviews, particularly when there is a lot of state and a lot of failure handling. Interviewers love to ask questions about this because it dominates the oncall rotation for many production teams and gets at the heart of what makes many distributed systems hard to build well. In this article, we'll cover what they are, how they work, and how to use them in your interviews.
Problem Breakdowns with Multi-step Processes Pattern

Uber

Payment System

Solutions

Distributed Transactions

Two-Phase Commit (2PC)

Saga Pattern

Single Server Orchestration

Event Sourcing

Workflows

Durable Execution Engines

Managed Workflow Systems

Implementations

When to Use in Interviews

Common interview scenarios

When NOT to use it in an interview

Common Deep Dives

"What if your coordinator service crashes during a distributed transaction?"

"How will you handle updates to the workflow?"

Workflow Versioning

Workflow Migrations

"How do we keep the workflow state size in check?"

"How do we deal with external events?"

"How can we ensure X step runs exactly once?"

Conclusion

Purchase Premium to Keep Reading

Unlock this article and so much more with Hello Interview Premium
Buy Premium

Currently up to 20% off

Hello Interview Premium

System Design Guided Practice
Exclusive content
Recent interview questions
Learn More
Reading Progress

On This Page

The Problem

Solutions

Distributed Transactions

Single Server Orchestration

Event Sourcing

Workflows

When to Use in Interviews

Common interview scenarios

When NOT to use it in an interview

Common Deep Dives

"What if your coordinator service crashes during a distributed transaction?"

"How will you handle updates to the workflow?"

"How do we keep the workflow state size in check?"

"How do we deal with external events?"

"How can we ensure X step runs exactly once?"

Conclusion

Questions
Meta SWE Interview QuestionsAmazon SWE Interview QuestionsGoogle SWE Interview QuestionsOpenAI SWE Interview QuestionsEngineering Manager (EM) Interview Questions
Learn
Learn System DesignLearn DSALearn BehavioralLearn ML System DesignLearn Low Level DesignGuided Practice
Links
FAQPricingGift PremiumHello Interview Premium
Legal
Terms and ConditionsPrivacy PolicySecurity
Contact
About UsProduct Support

7511 Greenwood Ave North Unit #4238 Seattle WA 98103


© 2026 Optick Labs Inc. All rights reserved.