Limited Time Offer:Up to 20% off Hello Interview Premium

Up to 20% off Hello Interview Premium 🎉

Search

⌘K

Tutor

Quick Reference

Sharding

What is Sharding?

·
Horizontal partitioning of data across multiple database instances
·
Each shard holds a subset of the total data
·
Used when a single DB can't handle the load or data volume
·
Last resort - try read replicas, caching, and vertical scaling first

Sharding Strategies

·
Range-Based: Shard by value ranges (e.g., user_id 1-1M on shard 1). Simple but prone to hot spots.
·
Hash-Based: hash(shard_key) % num_shards. Even distribution but hard to add shards.
·
Directory-Based: Lookup table maps keys to shards. Flexible but lookup table becomes bottleneck.
·
Consistent Hashing: Minimizes data movement when adding/removing shards. Best for dynamic scaling.

Choosing a Shard Key

·
High cardinality (many unique values) to distribute evenly
·
Matches your most common query pattern
·
Avoids hot spots (don't shard by date if most queries hit recent data)
·
Common choices: user_id, tenant_id, geographic region

Challenges

·
Cross-shard queries: JOINs across shards are expensive or impossible
·
Rebalancing: Moving data between shards when one gets too large
·
Referential integrity: Foreign keys don't work across shards
·
Operational complexity: More databases to manage, monitor, backup

When to Shard

·
Single DB exceeds storage capacity (>10TB+)
·
Write throughput exceeds what one DB can handle
·
Read replicas and caching aren't enough
·
Regulatory requirements (data residency by region)

Your account is free and you can post anonymously if you choose.