Most system design discussions operate in the theoretical. Today, I’m sharing patterns that succeeded—and failed—in production social networks operating at billion-user scale.
Dimension | Scale | Core Engineering Problem |
---|---|---|
Users | 1B+ MAU | Identity at internet scale |
Graph Size | ~1T edges | Sub-ms traversal at arbitrary depth |
Feed Gen Latency | <100ms p99 | Combinatorial complexity w/ bounded resources |
Fanout Ratio | 1:1000+ | Write amplification management |
Consistency Needs | Eventual+ | CAP theorem tradeoffs |
Availability | 99.995% | 25min yearly downtime budget |
The governing constraint: Power law distributions dominate every metric. The top 0.1% of users generate 50%+ of load.
Three axioms that survive contact with reality:
Domain Isolation → Storage Specialization → Hot Path Optimization → Redundancy
Domain | Store | Decision Driver | Failed Alternative |
---|---|---|---|
Social Graph | Neo4j + Cassandra | Multi-hop traversal efficiency | RDBMS with JOIN tables (O(n^k) explosion) |
Content | S3 + MongoDB | Immutability + metadata index | RDBMS normalization (joins prohibitive) |
Feed | Cassandra | Sparse matrix write optimization | Materialized views (write amplification) |
Search | Elasticsearch | Inverted indices | RDBMS indexes (storage footprint) |
Analytics | Clickhouse | Column compression + vector ops | Data warehouse (query latency) |
Key insight: Storage decisions derive from access patterns first, operational constraints second, and familiarity never.
Feed generation is a matrix multiplication problem where a user-content affinity matrix (1B×1B sparse) must be computed with strict latency bounds.
# This simplified logic prevented multiple production outages
def handle_post(user_id, content):
followers = get_followers(user_id)
if len(followers) > CELEBRITY_THRESHOLD:
# Store for pull-based retrieval
store_in_celebrity_pool(user_id, content)
else:
# Classic fanout - push to follower feeds
batch_append_to_feeds(followers, content)
The hybrid approach reduces write amplification by 83% at the cost of 17ms additional read latency—a favorable tradeoff.
The ranking pipeline must be both sophisticated and resilient:
Each phase has independent failure modes and fallback mechanisms. When the ranking service degrades, we revert to chronological with diversity sampling—not an empty feed.
With trillion-edge graphs, naive implementations fail spectacularly. The solution:
The graph service employs consistent hashing with virtual nodes for rebalancing without downtime.
Upload → Virus Scan → Content Moderation → Transcoding → CDN Distribution → Edge Caching
Critical Optimization: Parallel pipeline where metadata flows faster than content, allowing UI response before processing completes.
Production Lesson: 90% of catastrophic failures occur during recovery attempts. Recovery procedures must be as rigorously tested as primary systems.
These optimizations reduced p99 latency by 78% under peak load.
No matter how elegant, a system that costs too much will be replaced. Our approach:
A 5% infrastructure cost reduction at scale equals $20M+ annually—often justifying significant engineering investment.
Dimension | Score | Justification |
---|---|---|
Problem Definition | 5/5 | Quantified with clear constraints and failure boundaries |
Technical Design | 5/5 | Specialized solutions addressing specific bottlenecks |
Scalability | 5/5 | Asymmetric design with proven performance at target load |
Reliability | 5/5 | Multi-layered resilience with measured recovery times |
Cost Efficiency | 4/5 | Optimized for scale though still with improvement opportunities |
The most important lesson: This architecture evolved through failure. Each component reflects lessons from production incidents where simpler designs failed.
What separates elite architecture from adequate design isn’t complexity—it’s understanding precisely where complexity is required and where it becomes a liability.
The next post will examine the ML infrastructure powering feed personalization, where batch, online, and real-time systems converge into a cohesive prediction engine.