How to Design Twitter: Step by Step System Design Interview Guide
2025-08-25
Designing Twitter in an interview is about turning a messy problem into a crisp plan. You do not need every corner case. You do need a clear flow from requirements to APIs to data to scaling. Here is a beginner friendly path you can follow and speak through with confidence.
Step 1: Clarify scope and requirements
Start tiny, then grow.
Functional
- Post a tweet up to 280 characters with optional media
- Follow and unfollow users
- Home timeline with tweets from people I follow
- User timeline for a specific user
- Like and retweet
- Search by keyword and hashtag
- Notifications for follows, likes, replies, mentions
Non functional
- High read volume on home timelines
- Reasonable write throughput on tweets and engagements
- P95 home timeline read in under 200 ms from cache for hot users
- Availability target 99.9 percent for read APIs
- Eventual consistency is acceptable for counts and feeds
State up front which features you are postponing. For example, threads, quote tweets, lists, DMs, and spaces.
Step 2: Back of the envelope
Make light assumptions. It shows you can size a system.
- Daily active users: 50 million
- Average follows per user: 200
- Tweets per day: 200 million
- Peak write QPS: around 10k to 20k
- Home timeline reads: far larger than writes. Assume 200k to 400k QPS at peak
- Average tweet size on write: 300 bytes text plus small metadata. Media stored separately in object storage
This is enough to justify sharding and a heavy cache layer.
Step 3: High level APIs
Keep them boring and clear.
POST /v1/tweets
Body: { "user_id": "u123", "text": "hello", "media_ids": [] }
Resp: { "tweet_id": "t987" }
GET /v1/users/:id/timeline?cursor=...&limit=50
GET /v1/home?cursor=...&limit=50
POST /v1/follow  { "follower": "uA", "followee": "uB" }
POST /v1/like    { "user_id": "uA", "tweet_id": "tX" }
POST /v1/retweet { "user_id": "uA", "tweet_id": "tX" }
GET  /v1/search?q=...&type=top|latest&cursor=...
Pagination uses opaque cursors so you can switch storage later.
Step 4: Data model
You can do this in SQL or NoSQL. A hybrid is common.
Relational tables
-- tweets are immutable rows
CREATE TABLE tweets (
  tweet_id BIGINT PRIMARY KEY,
  user_id  BIGINT NOT NULL,
  text     VARCHAR(280) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL,
  like_count INT DEFAULT 0,
  retweet_count INT DEFAULT 0
) PARTITION BY HASH (tweet_id);
CREATE TABLE follows (
  follower BIGINT NOT NULL,
  followee BIGINT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (follower, followee)
);
CREATE INDEX idx_tweets_user_time ON tweets (user_id, created_at DESC);
Document or key value
- timeline:{user_id}stores a list of tweet ids for the home feed
- usertl:{user_id}stores a list for the user’s own tweets
- counts:{tweet_id}stores like and retweet counters for fast increments
Media
- Raw images and video in object storage with a CDN in front
Step 5: Architecture sketch
- Clients (web and mobile)
- API gateway or load balancer
- Stateless feed service, tweet service, social graph service, engagement service
- Cache tier with Redis or Memcached
- Primary data stores: sharded SQL or a wide column store for timelines and relations
- Search index with Elasticsearch or OpenSearch for keywords and hashtags
- Object storage plus CDN for media
- Stream backbone such as Kafka or Pulsar for fanout, counters, and search indexing
- Notification workers
Keep components small and describe what each owns.
Step 6: Timeline strategy
This is the core of the interview. Explain tradeoffs between fanout on write and fanout on read.
Fanout on write
- When a user posts, push the tweet id into the home timeline lists of all followers
- Read becomes fast. Write becomes heavy for celebrities
Fanout on read
- Store only the source user timeline
- At read time, merge K source lists for the current user
- Reads are heavy and complex. Writes are cheap
Pragmatic hybrid
- Default to fanout on write
- Mark very large accounts as heavy. For those, do not push. At read time, merge the heavy accounts’ recent tweets into the reader’s cached home feed
- This reduces worst case writes while keeping common reads fast
Home timeline flow
- On tweet, publish event to Kafka
- Fanout workers read event and push tweet id to follower lists in Redis or a timeline store
- For heavy authors, skip large pushes and record to a side list
- On GET home, fetch from Redis list. If cursor is near the end, optionally merge in any heavy sources since the last refresh
- Backfill misses from the persistent store if cache is cold
Step 7: Caching plan
- Home timeline key timeline:{user_id}as a list of tweet ids. TTL several hours. Sliding refresh when the user is active
- Tweet objects cached by id to avoid rehydration from storage
- User profile and counts cached with short TTL
- Write through for like and retweet counters to avoid double reads
- Use read replicas behind caches for cold miss protection
Aim for more than 90 percent cache hit on home timelines. That allows sub 50 ms median read at the service layer.
Step 8: Sharding and storage
- Tweets sharded by tweet_idwith a time based high bit so ids grow roughly with time. Snowflake style ids are fine
- Follows sharded by followerso writes during follow actions are single shard
- Timelines live in Redis for hot lists. Also persist to a wide column store by user_idandreverse_timefor recovery and long tails
- Use compaction workers to trim very long timelines and maintain order invariants
Call out why you chose the shard keys. The goal is to avoid hot partitions during spikes.
Step 9: Search and hashtags
- On tweet write, send an event to an indexer via Kafka
- Parse tokens and hashtags, store in an inverted index in Elasticsearch
- Two result types: latest sorted by time, top sorted by a relevance score that mixes recency and engagement
- Keep index lag under a few seconds. Eventual consistency is acceptable
Step 10: Notifications
- Write likes, retweets, follows, and mentions to a compact topic
- De duplicate on a rolling window per actor and action to avoid spam
- Persist notifications per user with a read model that supports pagination
- Push through APNs and FCM for mobile
Step 11: Rate limits, abuse, and safety
- Per user and per IP limits at the edge
- Write burst control on tweets and follows
- Bot and spam detection with heuristic features such as age of account, follow churn, block lists
- Soft delete content first, then hard delete after retention if required
- Privacy controls: protected accounts, blocked relationships, filtered views
You do not need a perfect trust and safety plan, only show you think about it.
Step 12: Consistency choices
- Timelines are eventually consistent. A new tweet may take seconds to appear for all followers
- Tweet object and user profile reads are strongly consistent on write back where possible
- Counters use CRDT like merges or idempotent increments on a log then materialize views, since exact values are not critical in real time
- Follows should be strongly consistent for the follower’s own view
Explain where you accept staleness and why it is safe.
Step 13: Observability and reliability
- Structured logs with request ids passed across services
- Metrics for RED: rate, errors, duration. Separate dashboards for read and write paths
- Tracing for tweet publish to fanout to home read
- SLOs and alerts on cache hit rate, Kafka lag, and p95 home read latency
- Graceful degradation: if fanout workers lag, home read should fall back to merging source timelines
Step 14: Failure testing and recovery
- Region outage plan. Keep cross region replication for data stores and Kafka
- Cache flush plan. On mass invalidation, protect backends with a two layer cache and request coalescing
- Backfill job to regenerate home timelines for users who miss pushes during incidents
Have one clean disaster story ready. That shows maturity.
Step 15: Walk through a request
Write
- Client calls POST /tweets
- Tweet service writes to primary store and cache
- Publish event to Kafka
- Fanout workers push to follower timelines or mark heavy path
Read
- Client calls GET /home?cursor=...
- Timeline service fetches ids from Redis list
- Hydrate tweets and authors from cache, fall back to store
- Return items plus a stable cursor
Narrate the happy path in under a minute.
Step 16: Complexity and costs
- Write complexity is proportional to follower count for normal users, capped by heavy user path
- Read complexity from cache is O(k) for page size
- Storage for tweets: 200 million per day times average 500 bytes including metadata is about 100 GB per day before compression, plus media in object storage
- Redis memory: keep last few thousand tweet ids per user. If 50 million users and 2k ids each, store only active users or use a tiered cache so you do not blow memory
Use round numbers. Interviewers only want to see your sense of scale.
Step 17: What to say at the end
- You can deliver a working v1 with fanout on write, Redis timelines, sharded tweets, and search indexers
- You know how to evolve toward heavy user handling, multi region, and better abuse controls
- You can trade exact counters and perfect freshness for speed and cost
Quick checklist you can bring to any feed design
- Requirements first
- APIs next
- Data model and id strategy
- Timeline plan: write, read, or hybrid
- Cache keys and TTLs
- Sharding and hot key avoidance
- Search indexing pipeline
- Notifications and rate limits
- Consistency choices
- Observability and failure modes
If you want a practice buddy that can generate a clean system design outline or a sample diagram while you rehearse, take a look at StealthCoder. It captures a prompt and returns a step by step plan you can study and explain. Helpful when you have to cover timelines, sharding, and fanout in a short mock.