How to Design Twitter: Step by Step System Design Interview Guide

2025-08-25

Designing Twitter in an interview is about turning a messy problem into a crisp plan. You do not need every corner case. You do need a clear flow from requirements to APIs to data to scaling. Here is a beginner friendly path you can follow and speak through with confidence.

Step 1: Clarify scope and requirements

Start tiny, then grow.

Functional

Post a tweet up to 280 characters with optional media
Follow and unfollow users
Home timeline with tweets from people I follow
User timeline for a specific user
Like and retweet
Search by keyword and hashtag
Notifications for follows, likes, replies, mentions

Non functional

High read volume on home timelines
Reasonable write throughput on tweets and engagements
P95 home timeline read in under 200 ms from cache for hot users
Availability target 99.9 percent for read APIs
Eventual consistency is acceptable for counts and feeds

State up front which features you are postponing. For example, threads, quote tweets, lists, DMs, and spaces.

Step 2: Back of the envelope

Make light assumptions. It shows you can size a system.

Daily active users: 50 million
Average follows per user: 200
Tweets per day: 200 million
Peak write QPS: around 10k to 20k
Home timeline reads: far larger than writes. Assume 200k to 400k QPS at peak
Average tweet size on write: 300 bytes text plus small metadata. Media stored separately in object storage

This is enough to justify sharding and a heavy cache layer.

Step 3: High level APIs

Keep them boring and clear.

POST /v1/tweets
Body: { "user_id": "u123", "text": "hello", "media_ids": [] }
Resp: { "tweet_id": "t987" }

GET /v1/users/:id/timeline?cursor=...&limit=50
GET /v1/home?cursor=...&limit=50
POST /v1/follow  { "follower": "uA", "followee": "uB" }
POST /v1/like    { "user_id": "uA", "tweet_id": "tX" }
POST /v1/retweet { "user_id": "uA", "tweet_id": "tX" }
GET  /v1/search?q=...&type=top|latest&cursor=...

Pagination uses opaque cursors so you can switch storage later.

Step 4: Data model

You can do this in SQL or NoSQL. A hybrid is common.

Relational tables

-- tweets are immutable rows
CREATE TABLE tweets (
  tweet_id BIGINT PRIMARY KEY,
  user_id  BIGINT NOT NULL,
  text     VARCHAR(280) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL,
  like_count INT DEFAULT 0,
  retweet_count INT DEFAULT 0
) PARTITION BY HASH (tweet_id);

CREATE TABLE follows (
  follower BIGINT NOT NULL,
  followee BIGINT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (follower, followee)
);

CREATE INDEX idx_tweets_user_time ON tweets (user_id, created_at DESC);

Document or key value

timeline:{user_id} stores a list of tweet ids for the home feed
usertl:{user_id} stores a list for the user’s own tweets
counts:{tweet_id} stores like and retweet counters for fast increments

Media

Raw images and video in object storage with a CDN in front

Step 5: Architecture sketch

Clients (web and mobile)
API gateway or load balancer
Stateless feed service, tweet service, social graph service, engagement service
Cache tier with Redis or Memcached
Primary data stores: sharded SQL or a wide column store for timelines and relations
Search index with Elasticsearch or OpenSearch for keywords and hashtags
Object storage plus CDN for media
Stream backbone such as Kafka or Pulsar for fanout, counters, and search indexing
Notification workers

Keep components small and describe what each owns.

Step 6: Timeline strategy

This is the core of the interview. Explain tradeoffs between fanout on write and fanout on read.

Fanout on write

When a user posts, push the tweet id into the home timeline lists of all followers
Read becomes fast. Write becomes heavy for celebrities

Fanout on read

Store only the source user timeline
At read time, merge K source lists for the current user
Reads are heavy and complex. Writes are cheap

Pragmatic hybrid

Default to fanout on write
Mark very large accounts as heavy. For those, do not push. At read time, merge the heavy accounts’ recent tweets into the reader’s cached home feed
This reduces worst case writes while keeping common reads fast

Home timeline flow

On tweet, publish event to Kafka
Fanout workers read event and push tweet id to follower lists in Redis or a timeline store
For heavy authors, skip large pushes and record to a side list
On GET home, fetch from Redis list. If cursor is near the end, optionally merge in any heavy sources since the last refresh
Backfill misses from the persistent store if cache is cold

Step 7: Caching plan

Home timeline key timeline:{user_id} as a list of tweet ids. TTL several hours. Sliding refresh when the user is active
Tweet objects cached by id to avoid rehydration from storage
User profile and counts cached with short TTL
Write through for like and retweet counters to avoid double reads
Use read replicas behind caches for cold miss protection

Aim for more than 90 percent cache hit on home timelines. That allows sub 50 ms median read at the service layer.

Step 8: Sharding and storage

Tweets sharded by tweet_id with a time based high bit so ids grow roughly with time. Snowflake style ids are fine
Follows sharded by follower so writes during follow actions are single shard
Timelines live in Redis for hot lists. Also persist to a wide column store by user_id and reverse_time for recovery and long tails
Use compaction workers to trim very long timelines and maintain order invariants

Call out why you chose the shard keys. The goal is to avoid hot partitions during spikes.

Step 9: Search and hashtags

On tweet write, send an event to an indexer via Kafka
Parse tokens and hashtags, store in an inverted index in Elasticsearch
Two result types: latest sorted by time, top sorted by a relevance score that mixes recency and engagement
Keep index lag under a few seconds. Eventual consistency is acceptable

Step 10: Notifications

Write likes, retweets, follows, and mentions to a compact topic
De duplicate on a rolling window per actor and action to avoid spam
Persist notifications per user with a read model that supports pagination
Push through APNs and FCM for mobile

Step 11: Rate limits, abuse, and safety

Per user and per IP limits at the edge
Write burst control on tweets and follows
Bot and spam detection with heuristic features such as age of account, follow churn, block lists
Soft delete content first, then hard delete after retention if required
Privacy controls: protected accounts, blocked relationships, filtered views

You do not need a perfect trust and safety plan, only show you think about it.

Step 12: Consistency choices

Timelines are eventually consistent. A new tweet may take seconds to appear for all followers
Tweet object and user profile reads are strongly consistent on write back where possible
Counters use CRDT like merges or idempotent increments on a log then materialize views, since exact values are not critical in real time
Follows should be strongly consistent for the follower’s own view

Explain where you accept staleness and why it is safe.

Step 13: Observability and reliability

Structured logs with request ids passed across services
Metrics for RED: rate, errors, duration. Separate dashboards for read and write paths
Tracing for tweet publish to fanout to home read
SLOs and alerts on cache hit rate, Kafka lag, and p95 home read latency
Graceful degradation: if fanout workers lag, home read should fall back to merging source timelines

Step 14: Failure testing and recovery

Region outage plan. Keep cross region replication for data stores and Kafka
Cache flush plan. On mass invalidation, protect backends with a two layer cache and request coalescing
Backfill job to regenerate home timelines for users who miss pushes during incidents

Have one clean disaster story ready. That shows maturity.

Step 15: Walk through a request

Write

Client calls POST /tweets
Tweet service writes to primary store and cache
Publish event to Kafka
Fanout workers push to follower timelines or mark heavy path

Read

Client calls GET /home?cursor=...
Timeline service fetches ids from Redis list
Hydrate tweets and authors from cache, fall back to store
Return items plus a stable cursor

Narrate the happy path in under a minute.

Step 16: Complexity and costs

Write complexity is proportional to follower count for normal users, capped by heavy user path
Read complexity from cache is O(k) for page size
Storage for tweets: 200 million per day times average 500 bytes including metadata is about 100 GB per day before compression, plus media in object storage
Redis memory: keep last few thousand tweet ids per user. If 50 million users and 2k ids each, store only active users or use a tiered cache so you do not blow memory

Use round numbers. Interviewers only want to see your sense of scale.

Step 17: What to say at the end

You can deliver a working v1 with fanout on write, Redis timelines, sharded tweets, and search indexers
You know how to evolve toward heavy user handling, multi region, and better abuse controls
You can trade exact counters and perfect freshness for speed and cost

Quick checklist you can bring to any feed design

Requirements first
APIs next
Data model and id strategy
Timeline plan: write, read, or hybrid
Cache keys and TTLs
Sharding and hot key avoidance
Search indexing pipeline
Notifications and rate limits
Consistency choices
Observability and failure modes

If you want a practice buddy that can generate a clean system design outline or a sample diagram while you rehearse, take a look at StealthCoder. It captures a prompt and returns a step by step plan you can study and explain. Helpful when you have to cover timelines, sharding, and fanout in a short mock.

https://stealthcoder.app