Design a Top-K System
Design a system that efficiently retrieves top-k items (songs, videos, hashtags, etc.) based on user activity or engagement metrics within specified time windows. The system should handle real-time data aggregation and support queries like "top 10 songs in the last 7 days" or "trending hashtags in the past 24 hours."
Asked at:
Meta
Uber

Amazon
Top‑K Ranking is a real‑time leaderboard service built into apps like YouTube (Trending videos), Spotify (Your top songs for the week), Instagram/Twitter (trending hashtags), and Uber Eats (popular restaurants). Users (and downstream services) request the top K items for a time window and segment (global, region, category, or per user), and expect fast, fresh, and stable results. Interviewers ask this to test whether you can stream-process high‑volume events, maintain time‑windowed counts, handle hot keys, precompute results, and serve low‑latency reads at scale. Strong answers show a clear separation of write aggregation and read serving, thoughtful handling of late/out‑of‑order data, and pragmatic trade‑offs between freshness, cost, and correctness.
Hello Interview Problem Breakdown
Design YouTube Top K
System design answer key for designing a feature to find the most popular videos like YouTube's trending section, built by FAANG managers and staff engineers.
Common Functional Requirements
Most candidates end up covering this set of core functionalities
Users should be able to retrieve the top K items for fixed windows (for example: last hour, last day, last 7 days, all‑time) with a clear freshness target.
Users should be able to scope results by segment (for example: global, region, category, or per user).
Users should be able to request different K values (for example: 10, 50, 100) and get stable, deterministic ordering including tie handling.
Users should be able to page through results when K is large and see consistent, cacheable responses.
Common Deep Dives
Common follow-up questions interviewers like to ask for this question
This is the core of the real‑time leaderboard: you need to aggregate counts by time window and update top‑K incrementally. Interviewers at companies like Meta expect you to reason about windowing, incremental maintenance, and state recovery. - Consider ingesting events into a durable log (e.g., Kafka) and using a stateful stream processor (e.g., Flink) with event‑time windows, watermarks, and allowed lateness to compute rolling counts. - You could choose either rising/falling edge accounting (add on arrival, subtract when window expires) or ring buffers of sub‑windows (e.g., 24 hourly buckets) to avoid rescans and simplify expiration. - Maintain a buffer larger than K (e.g., 2–5x K) and periodically re‑rank to avoid stale items lingering; checkpoint state and offsets frequently to enable fast recovery without double counting.
Skew is common in trending systems: a single hashtag, song, or video can dominate traffic. Your design should spread contention and apply backpressure gracefully. - Use sharded counters (key#shard) with skew‑aware partitioning; periodically merge shards via the stream processor or batched Lua scripts in Redis to avoid single‑key hotspots. - Add local pre‑aggregation and batching (e.g., N events or T milliseconds) before hitting shared stores; enable producer throttling and backpressure to protect downstream systems. - Consider dynamic shard scaling for extreme hot keys and circuit breakers to shed non‑critical work under load while preserving correctness guarantees.
Top‑K is read‑heavy and latency‑sensitive. Interviewers look for precomputation, cache design, and a plan for the long tail (e.g., per‑user Top‑K) without exploding cost. - Precompute and store ready‑to‑serve top‑K per window and segment in an in‑memory store (e.g., Redis ZSETs); use versioned keys per window with short TTLs aligned to freshness. - Warm caches for heavy segments (global/region/category) and compute per‑user lists only for active users or on demand with quotas and LRU eviction. - Avoid fan‑out on read and cross‑shard merges; do merges on write or via background workers so reads are a single key lookup.
Real‑world event streams are messy. Interviewers expect a plan for idempotency, late data, and recovery that preserves correctness and operational simplicity. - Deduplicate with event IDs and a time‑bounded dedup store; in stream processors, use watermarks and allowed lateness to update counts for late arrivals without double counting. - Use exactly‑once or effectively‑once sinks (e.g., Flink two‑phase commit) and periodic checkpoints/snapshots so you can replay from Kafka offsets on restart safely. - Run an offline reconciliation/backfill job (e.g., daily) against cold storage to catch drift and repair aggregates for compliance or end‑of‑week summaries.
Relevant Patterns
Relevant patterns that you should know for this question
Play/view/tag events can reach millions per second with heavy skew. You must aggregate writes via streaming, sharded counters, and batching to maintain windowed counts and avoid hot key contention.
Top lists are queried frequently by users and downstream services. Precomputing per‑window, per‑segment rankings and serving from a fast cache is critical to keep p99 latency low under high QPS.
Viral items create hotspots on counters and sorted sets. Sharding, skew‑aware partitioning, and merge‑on‑write patterns prevent a single key or shard from becoming a bottleneck.
Relevant Technologies
Relevant technologies that could be used to solve this question
Similar Problems to Practice
Related problems to practice for this question
Both maintain and serve real‑time rankings over high‑velocity streams with time windows, precomputation, and caching. The architectural core—windowed aggregation and hot‑key handling—is identical.
It also requires counting massive event streams with late/out‑of‑order data, windowed aggregation, and producing low‑latency, pre‑aggregated results for downstream services.
While the goal differs, both rely on accurate time‑windowed counters under skew and require expiration/rolling windows, deduplication, and contention control across distributed nodes.
Red Flags to Avoid
Common mistakes that can sink candidates in an interview
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
Mid October, 2025
Meta
Senior
Trending hashtags.
Early October, 2025
Meta
Manager
Top k songs listened by a given user of Facebook, along with the top k songs listed by their friends and the top k songs listened by the entire world.
Late September, 2025
Meta
Senior
Your account is free and you can post anonymously if you choose.