How to Solve System Design Questions in Go (Golang)
2025-08-18
System design interviews reward clear thinking, strong tradeoff skills, and practical implementation details. Go is a great language for this because it gives you lightweight concurrency, a simple standard library, and predictable performance. This guide shows a step by step way to approach any system design question, then maps each step to concrete Golang patterns and code you can sketch on a whiteboard or in an editor.
Step 1: Clarify the problem and constraints
Ask for:
- Core use cases and non goals
- Scale targets: QPS, daily active users, data size, expected growth
- Latency and availability objectives
- Read vs write ratio, burstiness, geo distribution, data retention
Write down success criteria. Tie every later choice back to these numbers.
Step 2: Define APIs and contracts first
Nail the surface area before internals.
Example: URL shortener
POST /v1/links
Body: { "long_url": "https://example.com/..." }
Resp: { "short": "abc123" }
GET /v/:code -> 301 Location: <long_url>
In Go, model the contract with interfaces so you can swap storage later.
type LinkStore interface {
Create(ctx context.Context, long string) (code string, err error)
Resolve(ctx context.Context, code string) (long string, err error)
}
Step 3: Choose data models and storage
Map requirements to persistence.
- Key value: Redis, DynamoDB, Badger, PostgreSQL with simple schema
- Document or column oriented for flexible attributes
- Time series for metrics and logs
Golang data model example:
type Link struct {
Code string
LongURL string
Created time.Time
Creator string
}
Schema in PostgreSQL:
CREATE TABLE links(
code TEXT PRIMARY KEY,
long_url TEXT NOT NULL,
created TIMESTAMPTZ NOT NULL DEFAULT now(),
creator TEXT
);
CREATE INDEX idx_links_created ON links(created);
Step 4: Design the high level architecture
Start simple, then add scale features as needed:
- Client -> API Gateway or load balancer
- Stateless Go web tier
- Cache layer for hot reads
- Primary store for durability
- Async workers for slow tasks
- Observability stack
Say how each piece meets the SLA. Call out the single writer vs multi leader tradeoff, read replicas, and cache policies.
Step 5: Plan caching, queues, and consistency
- Caching: read through for idempotent GETs, write through for strong read after write, TTL and invalidation rules
- Queues: absorb bursts, decouple ingestion from processing
- Consistency: pick per operation. Strong where needed, eventual when safe. Use idempotency keys on retries.
Go helpers:
// Read-through cache wrapper
type CachedStore struct {
Store LinkStore
Cache *ristretto.Cache
}
func (c *CachedStore) Resolve(ctx context.Context, code string) (string, error) {
if v, ok := c.Cache.Get(code); ok {
return v.(string), nil
}
long, err := c.Store.Resolve(ctx, code)
if err == nil { c.Cache.Set(code, long, 1) }
return long, err
}
Step 6: Concurrency, backpressure, and timeouts in Go
Make concurrency explicit and safe.
- Use
context.Context
to propagate timeouts and cancelation - Prefer worker pools with bounded concurrency over unbounded goroutines
- Add rate limits per user or per IP
- Implement circuit breakers and retries with jitter
// Bounded worker pool
type Pool struct {
sem chan struct{}
}
func NewPool(n int) *Pool { return &Pool{sem: make(chan struct{}, n)} }
func (p *Pool) Do(ctx context.Context, fn func() error) error {
select {
case p.sem <- struct{}{}:
defer func() { <-p.sem }()
return fn()
case <-ctx.Done():
return ctx.Err()
}
}
HTTP server with sane timeouts:
srv := &http.Server{
Addr: ":8080",
Handler: mux,
ReadTimeout: 5 * time.Second,
ReadHeaderTimeout: 2 * time.Second,
WriteTimeout: 5 * time.Second,
IdleTimeout: 60 * time.Second,
}
Step 7: Reliability patterns
- Idempotency: store operation keys, return same result on retry
- At least once workers: handle duplicates safely
- Dead letter queues: capture poison messages
- Graceful shutdown: close listeners, drain workers, flush metrics
// Graceful shutdown
go func() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, os.Interrupt, syscall.SIGTERM)
<-sig
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
_ = srv.Shutdown(ctx)
}()
Step 8: Observability in Go
- Structured logs:
log/slog
orzerolog
with request IDs - Metrics:
prometheus
client, counters, histograms, RED metrics - Tracing: OpenTelemetry SDK to instrument handlers and DB calls
- Profiling:
net/http/pprof
gated behind admin auth or dev only
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
logger.Info("request", "path", r.URL.Path, "rid", reqID)
Step 9: Security and multi tenancy
- JWT or session based auth, signed cookies with secure and httponly
- RBAC on handlers and data access layer
- Per tenant rate limits and quotas
- Secrets from environment or a secret manager, never hard coded
func RequireRole(next http.Handler, roles ...string) http.Handler {
allowed := make(map[string]bool, len(roles))
for _, r := range roles { allowed[r] = true }
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !allowed[RoleFrom(r.Context())] {
http.Error(w, "forbidden", http.StatusForbidden)
return
}
next.ServeHTTP(w, r)
})
}
Step 10: Capacity planning and scaling
- Vertical first, then horizontal
- Stateless API tier behind a load balancer
- Cache hot keys and paginate large reads
- Shard by key when write contention appears
- Background compaction, TTL, cold storage for old data
Speak to numbers. Example: 5k QPS, p95 100 ms, hot cache hit rate 90 percent, each Go instance handles 1k QPS on 2 vCPU, so start with 6 instances across 3 AZs.
Example walk through in Go: Rate limiter service
Problem: per API key allow N requests per minute and reject extras.
Scale target: 10k QPS, keys in the hundreds of thousands, low latency.
Design choices
- API tier in Go, Redis as a central counter store
- Token bucket per key with burst capacity
- Local hot cache to reduce Redis load
- Lua script for atomic bucket updates
Sketch
type Limiter interface {
Allow(ctx context.Context, key string, n int) (bool, error)
}
type RedisLimiter struct {
rdb *redis.Client
script *redis.Script
cap int // bucket capacity
refill time.Duration // tokens per time unit
}
func NewRedisLimiter(rdb *redis.Client, cap int, refill time.Duration) *RedisLimiter {
lua := `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local cap = tonumber(ARGV[2])
local refill = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local last_ts = tonumber(redis.call("HGET", key, "ts") or 0)
local tokens = tonumber(redis.call("HGET", key, "tok") or cap)
if last_ts == 0 then last_ts = now end
local delta = math.max(0, now - last_ts)
local add = delta * (cap / refill)
tokens = math.min(cap, tokens + add)
local ok = tokens >= cost
if ok then tokens = tokens - cost end
redis.call("HSET", key, "ts", now, "tok", tokens)
redis.call("PEXPIRE", key, refill * 2)
return ok and 1 or 0
`
return &RedisLimiter{
rdb: rdb,
script: redis.NewScript(lua),
cap: cap,
refill: refill,
}
}
func (l *RedisLimiter) Allow(ctx context.Context, key string, n int) (bool, error) {
now := time.Now().UnixMilli()
res, err := l.script.Run(ctx, l.rdb, []string{"ratel:" + key},
now, l.cap, l.refill.Milliseconds(), n).Int()
if err != nil { return false, err }
return res == 1, nil
}
HTTP handler with context and metrics
func handle(w http.ResponseWriter, r *http.Request, lim Limiter) {
ctx, cancel := context.WithTimeout(r.Context(), 50*time.Millisecond)
defer cancel()
key := r.Header.Get("X-API-Key")
ok, err := lim.Allow(ctx, key, 1)
if err != nil {
http.Error(w, "rate check error", http.StatusInternalServerError)
return
}
if !ok {
http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
w.WriteHeader(http.StatusNoContent)
}
Why this wins in an interview
- Clear API and SLA
- Atomic updates with Lua to avoid race conditions
- Bounded latency with context timeouts
- Easy to shard by key and scale horizontally
- Observability hooks fit naturally
Go specific patterns the interviewer wants to hear
context.Context
in every boundary- Interfaces for storage and external services to allow clean tests
errgroup
for parallel fan out with shared cancelationtime.Ticker
andtime.After
for background sweepers and timeouts- Minimize allocations in hot paths, reuse buffers with
sync.Pool
when profiling shows wins - Use
httptrace
andpprof
to find tail latency
Red flags to avoid
- Spawning unbounded goroutines
- Ignoring timeouts and retries
- Mixing logging formats across services
- Over-engineering before validating requirements
- Hard coding secrets or trusting user input
Interview day checklist
- Repeat the goal and constraints in your own words
- Draw the simple version first, then scale it stepwise
- Define APIs and data models up front
- Explain consistency tradeoffs per operation
- Show concurrency control, backpressure, and timeouts in Go
- Close with capacity math and failure scenarios
If you ever want a quick outline or a clean starter in Go while practicing system design, you can use a helper that captures your prompt and shows a short plan plus example code. It is handy when time is tight and you need to see a working shape fast. Check out StealthCoder for that use case: https://stealthcoder.app