How to Solve System Design Questions in Go (Golang)

2025-08-18

System design interviews reward clear thinking, strong tradeoff skills, and practical implementation details. Go is a great language for this because it gives you lightweight concurrency, a simple standard library, and predictable performance. This guide shows a step by step way to approach any system design question, then maps each step to concrete Golang patterns and code you can sketch on a whiteboard or in an editor.


Step 1: Clarify the problem and constraints

Ask for:

  • Core use cases and non goals
  • Scale targets: QPS, daily active users, data size, expected growth
  • Latency and availability objectives
  • Read vs write ratio, burstiness, geo distribution, data retention

Write down success criteria. Tie every later choice back to these numbers.


Step 2: Define APIs and contracts first

Nail the surface area before internals.

Example: URL shortener

POST /v1/links
Body: { "long_url": "https://example.com/..." }
Resp: { "short": "abc123" }

GET /v/:code -> 301 Location: <long_url>

In Go, model the contract with interfaces so you can swap storage later.

type LinkStore interface {
    Create(ctx context.Context, long string) (code string, err error)
    Resolve(ctx context.Context, code string) (long string, err error)
}

Step 3: Choose data models and storage

Map requirements to persistence.

  • Key value: Redis, DynamoDB, Badger, PostgreSQL with simple schema
  • Document or column oriented for flexible attributes
  • Time series for metrics and logs

Golang data model example:

type Link struct {
    Code     string
    LongURL  string
    Created  time.Time
    Creator  string
}

Schema in PostgreSQL:

CREATE TABLE links(
  code TEXT PRIMARY KEY,
  long_url TEXT NOT NULL,
  created TIMESTAMPTZ NOT NULL DEFAULT now(),
  creator TEXT
);
CREATE INDEX idx_links_created ON links(created);

Step 4: Design the high level architecture

Start simple, then add scale features as needed:

  1. Client -> API Gateway or load balancer
  2. Stateless Go web tier
  3. Cache layer for hot reads
  4. Primary store for durability
  5. Async workers for slow tasks
  6. Observability stack

Say how each piece meets the SLA. Call out the single writer vs multi leader tradeoff, read replicas, and cache policies.


Step 5: Plan caching, queues, and consistency

  • Caching: read through for idempotent GETs, write through for strong read after write, TTL and invalidation rules
  • Queues: absorb bursts, decouple ingestion from processing
  • Consistency: pick per operation. Strong where needed, eventual when safe. Use idempotency keys on retries.

Go helpers:

// Read-through cache wrapper
type CachedStore struct {
    Store LinkStore
    Cache *ristretto.Cache
}

func (c *CachedStore) Resolve(ctx context.Context, code string) (string, error) {
    if v, ok := c.Cache.Get(code); ok {
        return v.(string), nil
    }
    long, err := c.Store.Resolve(ctx, code)
    if err == nil { c.Cache.Set(code, long, 1) }
    return long, err
}

Step 6: Concurrency, backpressure, and timeouts in Go

Make concurrency explicit and safe.

  • Use context.Context to propagate timeouts and cancelation
  • Prefer worker pools with bounded concurrency over unbounded goroutines
  • Add rate limits per user or per IP
  • Implement circuit breakers and retries with jitter
// Bounded worker pool
type Pool struct {
    sem chan struct{}
}

func NewPool(n int) *Pool { return &Pool{sem: make(chan struct{}, n)} }

func (p *Pool) Do(ctx context.Context, fn func() error) error {
    select {
    case p.sem <- struct{}{}:
        defer func() { <-p.sem }()
        return fn()
    case <-ctx.Done():
        return ctx.Err()
    }
}

HTTP server with sane timeouts:

srv := &http.Server{
    Addr:              ":8080",
    Handler:           mux,
    ReadTimeout:       5 * time.Second,
    ReadHeaderTimeout: 2 * time.Second,
    WriteTimeout:      5 * time.Second,
    IdleTimeout:       60 * time.Second,
}

Step 7: Reliability patterns

  • Idempotency: store operation keys, return same result on retry
  • At least once workers: handle duplicates safely
  • Dead letter queues: capture poison messages
  • Graceful shutdown: close listeners, drain workers, flush metrics
// Graceful shutdown
go func() {
    sig := make(chan os.Signal, 1)
    signal.Notify(sig, os.Interrupt, syscall.SIGTERM)
    <-sig
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    _ = srv.Shutdown(ctx)
}()

Step 8: Observability in Go

  • Structured logs: log/slog or zerolog with request IDs
  • Metrics: prometheus client, counters, histograms, RED metrics
  • Tracing: OpenTelemetry SDK to instrument handlers and DB calls
  • Profiling: net/http/pprof gated behind admin auth or dev only
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
logger.Info("request", "path", r.URL.Path, "rid", reqID)

Step 9: Security and multi tenancy

  • JWT or session based auth, signed cookies with secure and httponly
  • RBAC on handlers and data access layer
  • Per tenant rate limits and quotas
  • Secrets from environment or a secret manager, never hard coded
func RequireRole(next http.Handler, roles ...string) http.Handler {
    allowed := make(map[string]bool, len(roles))
    for _, r := range roles { allowed[r] = true }
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if !allowed[RoleFrom(r.Context())] {
            http.Error(w, "forbidden", http.StatusForbidden)
            return
        }
        next.ServeHTTP(w, r)
    })
}

Step 10: Capacity planning and scaling

  • Vertical first, then horizontal
  • Stateless API tier behind a load balancer
  • Cache hot keys and paginate large reads
  • Shard by key when write contention appears
  • Background compaction, TTL, cold storage for old data

Speak to numbers. Example: 5k QPS, p95 100 ms, hot cache hit rate 90 percent, each Go instance handles 1k QPS on 2 vCPU, so start with 6 instances across 3 AZs.


Example walk through in Go: Rate limiter service

Problem: per API key allow N requests per minute and reject extras.

Scale target: 10k QPS, keys in the hundreds of thousands, low latency.

Design choices

  • API tier in Go, Redis as a central counter store
  • Token bucket per key with burst capacity
  • Local hot cache to reduce Redis load
  • Lua script for atomic bucket updates

Sketch

type Limiter interface {
    Allow(ctx context.Context, key string, n int) (bool, error)
}

type RedisLimiter struct {
    rdb *redis.Client
    script *redis.Script
    cap    int           // bucket capacity
    refill time.Duration // tokens per time unit
}

func NewRedisLimiter(rdb *redis.Client, cap int, refill time.Duration) *RedisLimiter {
    lua := `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local cap = tonumber(ARGV[2])
local refill = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local last_ts = tonumber(redis.call("HGET", key, "ts") or 0)
local tokens  = tonumber(redis.call("HGET", key, "tok") or cap)

if last_ts == 0 then last_ts = now end
local delta = math.max(0, now - last_ts)
local add = delta * (cap / refill)
tokens = math.min(cap, tokens + add)

local ok = tokens >= cost
if ok then tokens = tokens - cost end

redis.call("HSET", key, "ts", now, "tok", tokens)
redis.call("PEXPIRE", key, refill * 2)

return ok and 1 or 0
`
    return &RedisLimiter{
        rdb: rdb,
        script: redis.NewScript(lua),
        cap: cap,
        refill: refill,
    }
}

func (l *RedisLimiter) Allow(ctx context.Context, key string, n int) (bool, error) {
    now := time.Now().UnixMilli()
    res, err := l.script.Run(ctx, l.rdb, []string{"ratel:" + key},
        now, l.cap, l.refill.Milliseconds(), n).Int()
    if err != nil { return false, err }
    return res == 1, nil
}

HTTP handler with context and metrics

func handle(w http.ResponseWriter, r *http.Request, lim Limiter) {
    ctx, cancel := context.WithTimeout(r.Context(), 50*time.Millisecond)
    defer cancel()

    key := r.Header.Get("X-API-Key")
    ok, err := lim.Allow(ctx, key, 1)
    if err != nil {
        http.Error(w, "rate check error", http.StatusInternalServerError)
        return
    }
    if !ok {
        http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
        return
    }
    w.WriteHeader(http.StatusNoContent)
}

Why this wins in an interview

  • Clear API and SLA
  • Atomic updates with Lua to avoid race conditions
  • Bounded latency with context timeouts
  • Easy to shard by key and scale horizontally
  • Observability hooks fit naturally

Go specific patterns the interviewer wants to hear

  • context.Context in every boundary
  • Interfaces for storage and external services to allow clean tests
  • errgroup for parallel fan out with shared cancelation
  • time.Ticker and time.After for background sweepers and timeouts
  • Minimize allocations in hot paths, reuse buffers with sync.Pool when profiling shows wins
  • Use httptrace and pprof to find tail latency

Red flags to avoid

  • Spawning unbounded goroutines
  • Ignoring timeouts and retries
  • Mixing logging formats across services
  • Over-engineering before validating requirements
  • Hard coding secrets or trusting user input

Interview day checklist

  • Repeat the goal and constraints in your own words
  • Draw the simple version first, then scale it stepwise
  • Define APIs and data models up front
  • Explain consistency tradeoffs per operation
  • Show concurrency control, backpressure, and timeouts in Go
  • Close with capacity math and failure scenarios

If you ever want a quick outline or a clean starter in Go while practicing system design, you can use a helper that captures your prompt and shows a short plan plus example code. It is handy when time is tight and you need to see a working shape fast. Check out StealthCoder for that use case: https://stealthcoder.app