Quick ReferenceSharding
What is Sharding?
- ·
Horizontal partitioning of data across multiple database instances
- ·
Each shard holds a subset of the total data
- ·
Used when a single DB can't handle the load or data volume
- ·
Last resort - try read replicas, caching, and vertical scaling first
Sharding Strategies
- ·
Range-Based: Shard by value ranges (e.g., user_id 1-1M on shard 1). Simple but prone to hot spots.
- ·
Hash-Based: hash(shard_key) % num_shards. Even distribution but hard to add shards.
- ·
Directory-Based: Lookup table maps keys to shards. Flexible but lookup table becomes bottleneck.
- ·
Consistent Hashing: Minimizes data movement when adding/removing shards. Best for dynamic scaling.
Choosing a Shard Key
- ·
High cardinality (many unique values) to distribute evenly
- ·
Matches your most common query pattern
- ·
Avoids hot spots (don't shard by date if most queries hit recent data)
- ·
Common choices: user_id, tenant_id, geographic region
Challenges
- ·
Cross-shard queries: JOINs across shards are expensive or impossible
- ·
Rebalancing: Moving data between shards when one gets too large
- ·
Referential integrity: Foreign keys don't work across shards
- ·
Operational complexity: More databases to manage, monitor, backup
When to Shard
- ·
Single DB exceeds storage capacity (>10TB+)
- ·
Write throughput exceeds what one DB can handle
- ·
Read replicas and caching aren't enough
- ·
Regulatory requirements (data residency by region)
Your account is free and you can post anonymously if you choose.