Design a Payment System
Design a payment system that processes transactions through an external payment service for approval/denial, holds approved amounts, and batches all transactions daily for final processing while handling 10,000 transactions per second.
Asked at:
OpenAI
Stripe-like payment processing is a service that authorizes card payments, places temporary holds on funds, and later settles (captures) money to merchants. In this design, transactions are checked by an external processor in real time for approval/denial, the approved amount is held, and a daily batch captures all eligible holds. Interviewers ask this to test correctness under high throughput, integration with unreliable external systems, and your ability to model money flows safely. They want to see durable workflows (authorize → hold → capture/refund), idempotency, a ledger for auditability, and a robust daily batch pipeline that scales to 10k QPS without double charging or data loss.
Hello Interview Problem Breakdown
Design a Payment System
System design answer key for designing a payment system like Stripe, built by FAANG managers and staff engineers.
Common Functional Requirements
Most candidates end up covering this set of core functionalities
Users should be able to submit a payment authorization (hold) and receive an immediate approved/denied decision.
Users should be able to have approved funds held until a scheduled daily capture settles the payment.
Users should be able to cancel or adjust a previously authorized payment before capture, when supported.
Users should be able to view the lifecycle and status of each payment (created, authorized, captured, canceled, failed).
Common Deep Dives
Common follow-up questions interviewers like to ask for this question
Payments are notorious for retries and duplicate requests (client retries, network timeouts, worker restarts). Getting this wrong leads to double charges or phantom declines. Interviewers are looking for clear idempotency strategy and durable state transitions. - Use an idempotency key (e.g., merchant_id + order_id) per logical payment, stored in a fast dedupe store (e.g., Redis) and persisted in the ledger; treat the key as the primary identity for state transitions. - Make all producer/consumer paths idempotent: publish events keyed by payment_id (Kafka key) to preserve per-payment ordering; implement an outbox/inbox pattern so writes and publishes cannot diverge. - Apply optimistic concurrency or versioned states on the payment record; reject or no-op unexpected transitions (e.g., capture without prior authorize) to handle late/out-of-order callbacks.
The daily batch is a long-running, high-stakes job touching millions of rows. It must be chunked, resumable, and re-entrant to survive failures and deploys. It also must honor processor rate limits and partial failures. - Partition captures by merchant and time window, then process in idempotent chunks with checkpoints/watermarks recorded in control tables so reruns pick up exactly where they left off. - Use a two-phase pattern: mark eligible holds → enqueue capture intents → perform capture → record capture result; make each step idempotent and write audit logs for reconciliation. - Implement backpressure, concurrency control, and rate limiting toward the external processor; use retries with exponential backoff and dead-letter queues for persistent failures.
Money flows require immutable records and a complete audit trail. Simple mutable "balance" fields often drift or break under concurrency and retries. Interviewers expect an append-only ledger and clearly defined state transitions. - Use an append-only ledger (double-entry or well-typed entries) for each financial event (authorize, capture, release, refund); derive balances and payment state from events, not in-place mutation. - Bundle ledger writes with event publishing via a transactional outbox to avoid write-publish races; every state change must be durable and traceable. - Maintain reconciliation jobs and reports that compare your ledger with the external processor’s settlement files; track discrepancies and repair them deterministically.
At 10k QPS, synchronous fan-out to an external processor can overwhelm downstreams and create tail latency spikes. Decoupling, sharding, and resource controls are essential to keep p99 in check. - Front requests with an API tier that enqueues authorization intents to a durable log/queue (e.g., Kafka) and runs a scalable worker pool for processor calls; use per-merchant sharding to smooth hotspots. - Implement connection pooling, async I/O, timeouts, and circuit breakers for the processor; degrade gracefully (e.g., quick declines) if the processor is unhealthy. - Cache merchant configuration and risk rules in Redis to avoid DB hot paths; use rate limiting and token buckets to shape egress to the processor.
Relevant Patterns
Relevant patterns that you should know for this question
Authorizations, holds, captures, refunds, and reversals form a multi-step workflow with external side effects. You need sagas or durable workflows to orchestrate state transitions, handle partial failures, and implement compensations (e.g., release hold if capture fails).
Daily settlement is a long-running batch that must be chunked, resumable, and safe to rerun. A robust job orchestration pattern with checkpoints and idempotent steps is essential to avoid duplicate captures or missed settlements.
High contention occurs on the same payment or merchant accounts when retries or concurrent updates happen. Idempotency keys, per-entity ordering, optimistic concurrency, and deduplication are critical to prevent double charges and incorrect states.
Relevant Technologies
Relevant technologies that could be used to solve this question
PostgreSQL provides strong ACID guarantees for the financial ledger and payment state machine. With partitioned append-only tables and strict constraints, it supports auditability, reconciliation, and consistent transactions alongside an outbox for reliable event publishing.
Similar Problems to Practice
Related problems to practice for this question
Both designs center on authorizations, holds, captures, refunds, idempotency, and a durable ledger with reconciliation against external processors. The correctness and auditability requirements are identical.
Designing a resilient daily capture pipeline mirrors a distributed scheduler’s needs: chunking, checkpoints, retries, backpressure, and re-entrancy for safe reruns without duplicating work.
Like seat reservations, payment holds are temporary allocations under high contention and strict consistency needs. Both systems require idempotency, deduplication, and careful handling of release/expiry and finalization steps.
Red Flags to Avoid
Common mistakes that can sink candidates in an interview
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Early October, 2025
OpenAI
Mid-level
Early October, 2025
OpenAI
Staff
Screen round
Late August, 2025
OpenAI
Senior
Your account is free and you can post anonymously if you choose.