Communication Protocols — gRPC vs REST vs GraphQL in Go Microservices

Prerequisite: This is Part 12 of the System Design Masterclass. Previous parts built the reliability patterns — this part covers comparing communication protocols and data formats for microservice communication. Answer-first: gRPC is optimized for internal microservices using binary Protobuf serialization over multiplexed HTTP/2 or HTTP/3 streams. REST uses standard JSON over HTTP/1.1 or HTTP/2, serving as the default for public APIs. GraphQL operates as an aggregator at the API gateway or Backend-for-Frontend (BFF) layer, allowing clients to query specific properties, but requires complexity limits and DataLoader batching to prevent server degradation. ...

June 18, 2026 · 10 min · Tanh

Go Security & API Rate Limiting — Token Bucket, Leaky Bucket & Redis Lua

Prerequisite: This is Part 11 of the System Design Masterclass. Previous parts built the core components — this part covers securing APIs and managing client traffic spikes at scale. Answer-first: API rate limiting defends backend services by restricting request volume. Security requires a layered defense: Web Application Firewalls (WAF) block edge-level volumetric spikes, API Gateways manage L7 credentials and quotas, and application middleware enforces fine-grained business limits. Client identification must rely on validated, secure IP parsing (using the PROXY protocol or rightmost X-Forwarded-For checks). ...

June 18, 2026 · 9 min · Tanh

Go Observability & pprof — Memory Leaks, CPU Profiling & GODEBUG

Prerequisite: This is Part 10 of the System Design Masterclass. Previous parts built the architecture — this part teaches you how to see inside a running system and diagnose production performance issues. Answer-first: Go’s built-in pprof profiler provides CPU sampling, heap allocation analysis, goroutine stack inspection, and blocking profiler — all available as HTTP endpoints in running production services with minimal overhead. Heap diff between two snapshots is the fastest way to identify memory leaks. ...

June 18, 2026 · 9 min · Tanh

Consistent Hashing in Go — Virtual Nodes & CRC32 Ring

Prerequisite: Part 9 of the System Design Masterclass. Read Part 4: Database Scaling for context on horizontal partitioning strategies. Answer-first: Consistent Hashing minimizes key remapping when cluster membership changes. Adding or removing one node from a modulo-hash cluster remaps nearly all keys (catastrophic cache miss storm). Consistent Hashing remaps only $K/N$ keys — the theoretical minimum necessary. Why Modulo Hashing Fails When Scaling Answer-first: hash(key) % N changes to hash(key) % (N+1) when a node is added, causing nearly all key-to-node mappings to change. This creates a massive cache miss storm as the entire working set must be reloaded from the database simultaneously. ...

June 18, 2026 · 8 min · Tanh

Saga Pattern in Go — Temporal, Outbox Pattern & Debezium

Prerequisite: Part 8 of the System Design Masterclass. Read Part 7: Idempotent API Design first — compensating transactions in Saga must be idempotent. Answer-first: The Saga Pattern coordinates distributed transactions across microservices by decomposing a large transaction into a sequence of local transactions. If any step fails, the system automatically executes compensating transactions in reverse order to undo completed steps. Each local transaction must be idempotent. What Are the Problems with 2PC in Microservices? Answer-first: Two-Phase Commit (2PC) is a blocking protocol with a coordinator single point of failure. If the coordinator crashes between the Prepare and Commit phases, all participants are blocked indefinitely with locks held — a catastrophic failure mode in microservices. These are the same core banking distributed transaction challenges seen in legacy systems. ...

June 18, 2026 · 8 min · Tanh

Idempotent API Design in Go — Idempotency Key & Redis SetNX

Prerequisite: Part 7 of the System Design Masterclass. Read Part 6: Distributed Locks — concurrent duplicate request blocking relies on the same mutual exclusion primitives. Answer-first: API idempotency ensures that retrying an identical request (same Idempotency-Key) never produces additional side effects beyond the first execution. This is foundational for payment APIs where network timeouts force client retries, and a duplicate execution would mean a double charge. What Is an Idempotency Key? Answer-first: An Idempotency Key is a unique token — typically UUID v4 — generated by the client and attached as an Idempotency-Key HTTP header. The server uses this key to detect duplicate requests: if the key has been seen before, return the cached response from the first execution without re-executing the handler. ...

June 18, 2026 · 8 min · Tanh

Distributed Locks in Go — Redlock Math, etcd & Split-Brain

Prerequisite: Part 6 of the System Design Masterclass. Read Part 5: Kafka & Event-Driven to understand event sourcing patterns before tackling lock coordination. Answer-first: Distributed locks solve the mutual exclusion problem across independent servers — ensuring only one server can modify a shared resource at a time. Redis Redlock provides high-performance locking using majority quorum across multiple master nodes; etcd provides stronger guarantees via Raft consensus at the cost of higher latency. ...

June 18, 2026 · 8 min · Tanh

Kafka Worker Pool in Go — Backpressure & Exactly-Once

Prerequisite: Part 5 of the System Design Masterclass. Read Part 4: Database Scaling to understand the storage tier that persisted events are written to. Answer-first: Event-Driven Architecture decouples services through asynchronous communication via a durable message log. In Go, goroutines and buffered channels implement natural backpressure — when consumers fall behind producers, the channel fills up and blocks the producer, throttling the ingest rate automatically. Kafka vs RabbitMQ — When to Use Each? Answer-first: Kafka is a distributed commit log — messages are retained indefinitely, consumers manage their own offsets, and replay is possible. RabbitMQ is a message broker — messages are deleted after acknowledgment, the broker handles routing complexity, push-based delivery. They solve different problems. ...

June 18, 2026 · 8 min · Tanh

Database Sharding in Go — TiDB, PostgreSQL & Connection Pools

Prerequisite: Part 4 of the System Design Masterclass. Read Part 3: Caching Strategies to understand the cache layer before examining storage. Answer-first: Database sharding distributes data horizontally across independent partitions (shards) based on a shard key, reducing write contention and enabling linear storage growth. Choosing the wrong shard key leads to hot spots that can be worse than no sharding at all. Vertical vs Horizontal Scaling — When to Switch? Answer-first: Vertical scaling (scale-up) increases resources on a single server — simple but has a hard physical ceiling and non-linear cost growth. Horizontal scaling (scale-out) adds more servers — no theoretical ceiling, linear cost, but significantly higher operational complexity. ...

June 18, 2026 · 8 min · Tanh

Caching Strategies in Go — Cache Stampede, XFetch & Redis LFU

Prerequisite: Part 3 of the System Design Masterclass. Read Part 2: Load Balancing L4/L7 to understand the traffic layer before diving into the caching tier. Answer-first: Effective caching strategy selection hinges on the acceptable consistency window and the read/write access pattern of the workload. Write-Through suits financial records; Write-Behind suits analytics and event counters; Cache-Aside is the default for read-heavy API responses. How Does Cache Stampede Happen? Answer-first: Cache Stampede (thundering herd) occurs when a popular cached key expires and multiple concurrent goroutines simultaneously detect a cache miss — then all query the database simultaneously. The burst of duplicate DB queries can exceed connection pool capacity and cause cascading failure. ...

June 18, 2026 · 9 min · Tanh