Interview Coaching
Learn
System Design
ML System Design
Code
Behavioral
Salary Negotiation
Interview Guides
Design a Feature Flag Service
Design a feature flag service that allows software teams to enable/disable features dynamically without code deployment, supporting targeted rollouts to user segments (by region, percentage, etc.) and handling 1M flag evaluations per second with minimal latency.
Asked at:
Stripe

Amazon
A feature flag service is a configuration platform that lets teams turn features on or off at runtime, target rollouts to specific users or regions, and gradually ramp changes without deploying new code. Think of products like LaunchDarkly or ConfigCat that provide client SDKs and admin consoles to manage flags safely at scale. Interviewers ask this because it stresses real-world trade-offs: ultra-low-latency read paths, safe and consistent rollout semantics, global propagation of config updates, and robust failure handling. They’re testing whether you can separate read-heavy evaluation from infrequent writes, choose the right consistency model, design client/server SDK behavior, and plan for observability and rollback. Expect follow-ups on caching, real-time updates, deterministic bucketing, and resilience under partial outages.
Common Functional Requirements
Most candidates end up covering this set of core functionalities
Users should be able to create, edit, and disable feature flags, including scheduling gradual rollouts and instant rollbacks.
Users should be able to target features to segments and rules (e.g., by region, attributes, and percentage-based rollouts) with clear precedence.
Users should be able to evaluate flags in their applications with minimal latency and high availability, without needing to redeploy code.
Users should be able to see audit history and metrics for flag changes (who changed what, when, and the impact) to ensure safe operations.
Common Deep Dives
Common follow-up questions interviewers like to ask for this question
This system is read-dominant, so your performance story lives or dies on the evaluation path. Interviewers want to see that you avoid a network hop for every check and that your hot path is CPU- and cache-friendly. - Consider client-side SDKs that keep an in-memory snapshot of flag rules and evaluate locally; your server should be an infrequent control plane, not on the critical path. - You could precompile rules into a compact, branch-friendly structure (e.g., ordered match lists) and use constant-time deterministic hashing for percentage rollouts. - Keep payloads small: compress config, shard by environment/project, and avoid per-request database/cache lookups on the hot path.
Config propagation must be fast and controlled. Interviewers look for a push-based strategy with safe fallbacks, versioning, and backoff to prevent storms when many clients refresh at once. - Consider a pub/sub channel (e.g., streaming updates) for near real-time invalidation, with clients using ETags/versions and incremental diffs to update snapshots. - You could combine long polling or SSE with jittered retry and exponential backoff to avoid thundering herds during outages. - Use semantic versioning of configs and TTLs so clients can serve slightly stale values during brief partitions while ensuring eventual convergence.
Rollout correctness is critical; random selection per request causes user flapping and poor UX. Interviewers expect deterministic bucketing and clear rule precedence. - Use a stable, deterministic hash of (flag_key + user_stable_id + salt) to map users into buckets; document behavior when the stable ID is missing. - Consider rule ordering and short-circuiting: segment rules first, then percentage; ensure the same logic exists across all SDKs and the server for parity. - Version segments and maintain immutable references so changes don’t silently move users between buckets mid-rollout.
Resilience and safety separate great designs from good ones. Interviewers expect defaults, kill switches, and guardrails. - You could design SDKs to fail open or fail closed per-flag policy, with last-known-good snapshots and a global kill switch path that requires no dependencies. - Add change validation, dry-run, and two-person review for risky flags; auto-roll back on error rate or latency SLO breaches using simple policies. - Isolate tenants and environments, rate-limit updates, and use circuit breakers to prevent cascading failures when the control plane is degraded.
Relevant Patterns
Relevant patterns that you should know for this question
Flag evaluation is an extreme read-heavy workload (1M+ evaluations/s) with tiny writes. You need client-side and edge caches, compact rule evaluation, and read-optimized data flows to keep latency low and availability high.
Teams expect changes to take effect within seconds without redeploys. A push-based, pub/sub update pipeline with versioned snapshots is essential to invalidate caches and propagate new rules safely and quickly.
Hot flags and mass invalidations can cause stampedes on caches and streams. You need jitter, backoff, single-flight reloads, and careful broadcast strategies to prevent overload during global rollouts or incidents.
Relevant Technologies
Relevant technologies that could be used to solve this question
Similar Problems to Practice
Related problems to practice for this question
Both require ultra-low-latency, per-request decisions with deterministic behavior and strong fallback strategies under partial failures. Client-local evaluation and consistency trade-offs are central in both designs.
You must manage cache invalidation, handle hot keys, and prevent thundering herds when configurations change. Read-scaling and stale-while-revalidate patterns apply directly.
Both rely on real-time fanout to a large number of clients, with backpressure, reconnection, and ordering concerns. The mechanics of pushing updates quickly and safely are closely related.
Red Flags to Avoid
Common mistakes that can sink candidates in an interview
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Early October, 2025
Stripe
Senior
Early September, 2025
Stripe
Staff
Late August, 2025
Stripe
Senior
Community Solutions
Comments
Currently 30% off
Hello Interview Premium
Your account is free and you can post anonymously if you choose.