Design a visa payment network system for transaction processing
Design a payment network system that processes credit card transactions between merchants and issuing banks at scale. The system should handle transaction routing, bank connectivity, failure scenarios (retries, idempotency), and network outages.
Asked at:
Databricks
Visa-like payment networks are real-time transaction routers that sit between merchants and issuing banks to authorize, capture, refund, and settle card payments at massive scale. Think of the moment you tap your card: the network validates the card, routes the request to the correct bank, and returns an approve/decline in a few hundred milliseconds, then later clears and settles funds. Interviewers ask this because it blends low-latency request/response with durable, auditable workflows under failure. You must reason about routing (BIN ranges), idempotency, retries, backpressure, multi-region availability, and correctness of money movement—all while meeting strict SLAs and regulatory constraints. The goal is to see if you can build a fault-tolerant, scalable, and financially correct system, not just an API layer.
Common Functional Requirements
Most candidates end up covering this set of core functionalities
Users should be able to submit purchase authorization requests and receive a synchronous approve/decline decision within strict latency SLAs.
Users should be able to capture, void, and refund prior authorizations reliably, preserving links to the original transaction for auditability.
Users should be able to process transactions that are automatically routed to the correct issuer/processor with resilient failover when endpoints degrade or go offline.
Users should be able to receive asynchronous status updates (e.g., webhooks) and access daily settlement/reconciliation reports.
Common Deep Dives
Common follow-up questions interviewers like to ask for this question
Idempotency is the core reliability property in payments: networks retry on timeouts and clients often retry on 5xx, so duplicates are inevitable. Interviewers want to see that you can deduplicate without sacrificing throughput and that you understand the difference between at-least-once delivery and exactly-once effects on money movement. - Use an idempotency key (e.g., merchantTransactionId + amount + currency + card fingerprint) stored in a fast, write-optimized store with TTL; consider storing the canonical result to return on repeat requests. - Place a dedupe gate early (before side effects) and make downstream writes idempotent (e.g., upserts with conditional writes) to survive consumer retries from Kafka or similar. - Consider the outbox/inbox pattern and deterministic partitioning (by PAN hash) so all operations for a card hit the same partition, reducing concurrent duplicates across regions.
Low latency and correct routing are the day-one requirements. Candidates often forget connection management and failover behavior when a bank is slow or down. - Keep an in-memory, hot-reloadable BIN table for routing; pre-resolve issuer endpoints and maintain warm TLS connections with aggressive connection pooling. - Add circuit breakers, health checks, and per-issuer timeouts; on failure, fail fast or enter stand-in processing using conservative risk rules and later reconcile. - Co-locate edge auth clusters near merchants, perform early validation/tokenization at the edge, and use geo-aware routing to the nearest healthy issuer connection hub.
Payments must stay up during a regional outage, but naïve multi-region can lead to split-brain and duplicate approvals. Interviewers look for clear data partitioning and consistency choices tied to business rules. - Deterministically partition by PAN hash so all same-card operations go to one primary partition/region, with a promoted secondary on failover to avoid concurrent approvals. - Use append-only event logs for the ledger and reconcile asynchronously across regions; keep the synchronous auth path minimal and strongly guarded by idempotency. - Replicate routing tables and risk configs globally with versioning; use leader election per partition and monotonic sequence numbers to prevent conflicting decisions.
Authorization is real-time, but money movement is finalized later via clearing and settlement. The audit trail and replayability are non-negotiable in a system design interview for finance. - Model transactions as immutable events with a derived ledger; never overwrite financial history—append compensations (refunds, chargebacks) instead. - Use a durable event backbone for clearing batches and build idempotent settlement jobs; generate daily reports for merchants and issuers from the same source of truth. - Implement automated reconciliation pipelines that compare issuer responses, network records, and merchant batches; flag mismatches for manual review with traceable correlation IDs.
Relevant Patterns
Relevant patterns that you should know for this question
Authorizations, captures, refunds, clearing, and settlement form a multi-step workflow with compensations (voids/chargebacks). Modeling this with sagas or a workflow engine ensures stateful progress, retries, and correctness across service boundaries.
High contention occurs on the same card or order during retries and timeouts. Idempotency keys, conditional writes, and deterministic partitioning prevent duplicate charges and race conditions under load.
Payment networks are write-heavy (authorizations, state transitions, ledger events). You must design for high-throughput, low-latency writes with durable persistence and backpressure handling.
Relevant Technologies
Relevant technologies that could be used to solve this question
Similar Problems to Practice
Related problems to practice for this question
Designing a Stripe-like system shares the same concerns: real-time authorization, idempotency on retries, webhooks, and asynchronous settlement with strong auditability.
Order routing to exchanges resembles issuer routing: low-latency paths, failover, exactly-once effects on money/positions, and an immutable ledger for reconciliation and compliance.
High-throughput event ingestion with deduplication and replay mirrors the need for at-least-once processing, idempotent consumers, and backfill/reprocessing in clearing and reconciliation.
Red Flags to Avoid
Common mistakes that can sink candidates in an interview
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Late August, 2025
Databricks
Senior Manager
Late May, 2021
Databricks
Staff
Your account is free and you can post anonymously if you choose.