Interview Standards
Show minimum schema
Include key fields, relationships, and index or partition notes beside the database.
Tie choices to requirements
Say how data volume, access pattern, or consistency need caused each schema choice.
Stay domain-specific
Use users, tweets, follows for Twitter-like systems, not abstract entity names.
Do not over-model
Good enough is clear and functional; no full normalization or complete schema diagram expected.
Database Choice
- Relational Databases (SQL): Default for most designs; PostgreSQL handles clear entities, joins, and ACID transactions.
- Document Databases: Use for changing, nested, or varied records; eliminates joins but complicates updates.
- Key-Value Stores: Use for caches, sessions, feature flags, or exact-key lookups; limited query capability.
- Wide-Column Databases: Use for enormous writes, time-series, or append-heavy analytics; model around queries.
- Graph Databases: Almost never in interviews; traversal gains rarely justify operational complexity.
Schema Drivers
Data volume
Decides whether data fits one store or must split across systems with separate schemas.
Access patterns
Most important driver; derive required queries from each API endpoint.
Consistency requirements
Strong consistency favors one ACID database; eventual consistency allows distribution.
Keys Relationships
Key Types
- Primary Keys: Use stable system-generated IDs like user_id or post_id, not mutable business data.
- Foreign Keys: Enforce referential integrity, but each insert or update must be validated.
- Constraints: NOT NULL, UNIQUE, and CHECK protect data quality with added write overhead.
Relationship Cardinalities
- One-to-many: Model parent-child ownership, such as user to posts or post to comments.
- Many-to-many: Use a linking table such as likes to connect users and posts.
- One-to-one: Rare in practice; often a sign two tables should be merged.
Access Indexes
- posts.user_id: Supports fetching all posts by one user or GET /users/{id}/posts.
- posts.created_at: Supports loading recent posts in chronological order.
- (user_id, created_at): Composite index for a user's recent posts without scanning all posts.
Normalization Choice
- Start Normalized: Default starting point; store each fact once to avoid update anomalies.
- Denormalize Selectively: Use for analytics, audit trails, event logs, search, or heavily read-optimized systems.
- Denormalized Cache: Source of truth stays clean; the cache holds precomputed joins or aggregations for fast reads.
Sharding Choice
Shard Key Selection
- Shard by access pattern: Default if sharding is needed; keep related data together, such as posts by user_id.
- Avoid time-range sharding: Current writes hit the latest shard, creating a hot shard in write-heavy systems.
Sharding Constraints
- Avoid cross-shard queries: Querying and merging multiple shards is expensive and complex.
- Treat shard key as permanent: Shard key choice affects every query and is often permanent.

Your account is free and you can post anonymously if you choose.