Limited Time Offer:Up to 20% off Hello Interview Premium
Up to 20% off Hello Interview Premium 🎉
Hello Interview
Learn System Design
Introduction
How to Prepare
Delivery Framework
Core Concepts
Key Technologies
Common Patterns
Question Breakdowns
Networking Essentials
API Design
Data Modeling
Quick Reference
Caching
Sharding
Consistent Hashing
CAP Theorem
Database Indexing
Numbers to Know
Bitly
Dropbox
Local Delivery Service
Ticketmaster
FB News Feed
Tinder
LeetCode
WhatsApp
Rate Limiter
FB Live Comments
FB Post Search
YouTube Top K
Uber
YouTube
Web Crawler
Ad Click Aggregator
News Aggregator
Yelp
Strava
Online Auction
Price Tracking Service
Instagram
Robinhood
Google Docs
Distributed Cache
Job Scheduler
Payment System
Metrics Monitoring
ChatGPT
Real-time Updates
Dealing with Contention
Multi-step Processes
Scaling Reads
Scaling Writes
Handling Large Blobs
Managing Long Running Tasks
Redis
Elasticsearch
Kafka
API Gateway
Cassandra
DynamoDB
PostgreSQL
Flink
ZooKeeper
Time Series Databases
Data Structures for Big Data
Vector Databases
Vote For New Content
Pricing
Sign in / Sign up
Search
⌘K
Pricing

Tutor

Hello InterviewQuick Reference
Full Article
Quick Reference

Data Modeling

Interview Standards

Show minimum schema

Include key fields, relationships, and index or partition notes beside the database.

Tie choices to requirements

Say how data volume, access pattern, or consistency need caused each schema choice.

Stay domain-specific

Use users, tweets, follows for Twitter-like systems, not abstract entity names.

Do not over-model

Good enough is clear and functional; no full normalization or complete schema diagram expected.

Database Choice

Relational Databases (SQL) are the default for most designs with PostgreSQL, clear entities, joins, and ACID transactions; Document Databases fit changing, nested, or varied records but complicate updates; Key-Value Stores fit caches, sessions, feature flags, and exact-key lookups with limited queries; Wide-Column Databases fit enormous writes, time-series, and append-heavy analytics; Graph Databases are almost never needed in interviews because traversal gains rarely justify operational complexity.
  • Relational Databases (SQL): Default for most designs; PostgreSQL handles clear entities, joins, and ACID transactions.
  • Document Databases: Use for changing, nested, or varied records; eliminates joins but complicates updates.
  • Key-Value Stores: Use for caches, sessions, feature flags, or exact-key lookups; limited query capability.
  • Wide-Column Databases: Use for enormous writes, time-series, or append-heavy analytics; model around queries.
  • Graph Databases: Almost never in interviews; traversal gains rarely justify operational complexity.

Schema Drivers

Data volume

Decides whether data fits one store or must split across systems with separate schemas.

Access patterns

Most important driver; derive required queries from each API endpoint.

Consistency requirements

Strong consistency favors one ACID database; eventual consistency allows distribution.

Keys Relationships

Key Types

Primary Keys should be stable system-generated IDs, Foreign Keys enforce referential integrity with write validation cost, and Constraints protect data quality with write overhead.
  • Primary Keys: Use stable system-generated IDs like user_id or post_id, not mutable business data.
  • Foreign Keys: Enforce referential integrity, but each insert or update must be validated.
  • Constraints: NOT NULL, UNIQUE, and CHECK protect data quality with added write overhead.

Relationship Cardinalities

One-to-many models parent-child ownership, Many-to-many uses a linking table such as likes, and One-to-one is rare and often should be merged.
  • One-to-many: Model parent-child ownership, such as user to posts or post to comments.
  • Many-to-many: Use a linking table such as likes to connect users and posts.
  • One-to-one: Rare in practice; often a sign two tables should be merged.

Access Indexes

posts.user_id supports fetching all posts by one user or GET /users/{id}/posts, posts.created_at supports loading recent posts in chronological order, and (user_id, created_at) supports a user's recent posts without scanning all posts.
  • posts.user_id: Supports fetching all posts by one user or GET /users/{id}/posts.
  • posts.created_at: Supports loading recent posts in chronological order.
  • (user_id, created_at): Composite index for a user's recent posts without scanning all posts.

Normalization Choice

Start Normalized is the default to store each fact once and avoid update anomalies, Denormalize Selectively fits analytics, audit trails, event logs, search, and heavily read-optimized systems, and Denormalized Cache keeps the source of truth clean while a cache holds precomputed joins or aggregations for fast reads.
  • Start Normalized: Default starting point; store each fact once to avoid update anomalies.
  • Denormalize Selectively: Use for analytics, audit trails, event logs, search, or heavily read-optimized systems.
  • Denormalized Cache: Source of truth stays clean; the cache holds precomputed joins or aggregations for fast reads.

Sharding Choice

Shard Key Selection

Shard by access pattern is the default when sharding is needed to keep related data together, while Avoid time-range sharding because current writes hit the latest shard in write-heavy systems.
  • Shard by access pattern: Default if sharding is needed; keep related data together, such as posts by user_id.
  • Avoid time-range sharding: Current writes hit the latest shard, creating a hot shard in write-heavy systems.

Sharding Constraints

Avoid cross-shard queries because querying and merging multiple shards is expensive and complex, and Treat shard key as permanent because it affects every query and is often permanent.
  • Avoid cross-shard queries: Querying and merging multiple shards is expensive and complex.
  • Treat shard key as permanent: Shard key choice affects every query and is often permanent.

Your account is free and you can post anonymously if you choose.

Reading Progress

On This Page

Interview Standards

Database Choice

Schema Drivers

Keys Relationships

Key Types

Relationship Cardinalities

Access Indexes

Normalization Choice

Sharding Choice

Shard Key Selection

Sharding Constraints

Questions
Meta SWE Interview QuestionsAmazon SWE Interview QuestionsGoogle SWE Interview QuestionsOpenAI SWE Interview QuestionsEngineering Manager (EM) Interview Questions
Learn
Learn System DesignLearn DSALearn BehavioralLearn ML System DesignLearn Low Level DesignGuided Practice
Links
FAQPricingGift PremiumHello Interview Premium
Legal
Terms and ConditionsPrivacy PolicySecurity
Contact
About UsProduct Support

7511 Greenwood Ave North Unit #4238 Seattle WA 98103


© 2026 Optick Labs Inc. All rights reserved.