Saga Pattern in Go — Temporal, Outbox Pattern & Debezium

Prerequisite: Part 8 of the System Design Masterclass. Read Part 7: Idempotent API Design first — compensating transactions in Saga must be idempotent. Answer-first: The Saga Pattern coordinates distributed transactions across microservices by decomposing a large transaction into a sequence of local transactions. If any step fails, the system automatically executes compensating transactions in reverse order to undo completed steps. Each local transaction must be idempotent. What Are the Problems with 2PC in Microservices? Answer-first: Two-Phase Commit (2PC) is a blocking protocol with a coordinator single point of failure. If the coordinator crashes between the Prepare and Commit phases, all participants are blocked indefinitely with locks held — a catastrophic failure mode in microservices. These are the same core banking distributed transaction challenges seen in legacy systems. ...

June 18, 2026 · 8 min · Tanh

Idempotent API Design in Go — Idempotency Key & Redis SetNX

Prerequisite: Part 7 of the System Design Masterclass. Read Part 6: Distributed Locks — concurrent duplicate request blocking relies on the same mutual exclusion primitives. Answer-first: API idempotency ensures that retrying an identical request (same Idempotency-Key) never produces additional side effects beyond the first execution. This is foundational for payment APIs where network timeouts force client retries, and a duplicate execution would mean a double charge. What Is an Idempotency Key? Answer-first: An Idempotency Key is a unique token — typically UUID v4 — generated by the client and attached as an Idempotency-Key HTTP header. The server uses this key to detect duplicate requests: if the key has been seen before, return the cached response from the first execution without re-executing the handler. ...

June 18, 2026 · 8 min · Tanh

Distributed Locks in Go — Redlock Math, etcd & Split-Brain

Prerequisite: Part 6 of the System Design Masterclass. Read Part 5: Kafka & Event-Driven to understand event sourcing patterns before tackling lock coordination. Answer-first: Distributed locks solve the mutual exclusion problem across independent servers — ensuring only one server can modify a shared resource at a time. Redis Redlock provides high-performance locking using majority quorum across multiple master nodes; etcd provides stronger guarantees via Raft consensus at the cost of higher latency. ...

June 18, 2026 · 8 min · Tanh

Kafka Worker Pool in Go — Backpressure & Exactly-Once

Prerequisite: Part 5 of the System Design Masterclass. Read Part 4: Database Scaling to understand the storage tier that persisted events are written to. Answer-first: Event-Driven Architecture decouples services through asynchronous communication via a durable message log. In Go, goroutines and buffered channels implement natural backpressure — when consumers fall behind producers, the channel fills up and blocks the producer, throttling the ingest rate automatically. Kafka vs RabbitMQ — When to Use Each? Answer-first: Kafka is a distributed commit log — messages are retained indefinitely, consumers manage their own offsets, and replay is possible. RabbitMQ is a message broker — messages are deleted after acknowledgment, the broker handles routing complexity, push-based delivery. They solve different problems. ...

June 18, 2026 · 8 min · Tanh

Database Sharding in Go — TiDB, PostgreSQL & Connection Pools

Prerequisite: Part 4 of the System Design Masterclass. Read Part 3: Caching Strategies to understand the cache layer before examining storage. Answer-first: Database sharding distributes data horizontally across independent partitions (shards) based on a shard key, reducing write contention and enabling linear storage growth. Choosing the wrong shard key leads to hot spots that can be worse than no sharding at all. Vertical vs Horizontal Scaling — When to Switch? Answer-first: Vertical scaling (scale-up) increases resources on a single server — simple but has a hard physical ceiling and non-linear cost growth. Horizontal scaling (scale-out) adds more servers — no theoretical ceiling, linear cost, but significantly higher operational complexity. ...

June 18, 2026 · 8 min · Tanh

Caching Strategies in Go — Cache Stampede, XFetch & Redis LFU

Prerequisite: Part 3 of the System Design Masterclass. Read Part 2: Load Balancing L4/L7 to understand the traffic layer before diving into the caching tier. Answer-first: Effective caching strategy selection hinges on the acceptable consistency window and the read/write access pattern of the workload. Write-Through suits financial records; Write-Behind suits analytics and event counters; Cache-Aside is the default for read-heavy API responses. How Does Cache Stampede Happen? Answer-first: Cache Stampede (thundering herd) occurs when a popular cached key expires and multiple concurrent goroutines simultaneously detect a cache miss — then all query the database simultaneously. The burst of duplicate DB queries can exceed connection pool capacity and cause cascading failure. ...

June 18, 2026 · 9 min · Tanh

Load Balancing L4/L7 in Go — DSR, Rate Limiting & API Gateway

Prerequisite: Part 2 of the System Design Masterclass. Read Part 1: System Design Thinking first to understand foundational trade-off frameworks. Answer-first: L4 load balancing routes traffic by transport-layer (IP/TCP/UDP) metadata — minimal CPU overhead but limited intelligence. L7 load balancing inspects HTTP headers, paths, and cookies — enables content-based routing and advanced health checks at the cost of higher processing overhead per request. L4 vs L7 Load Balancing — The Definitive Comparison Answer-first: The fundamental difference is where in the network stack the routing decision is made. L4 (Transport Layer) routes at TCP/UDP level using IP+port tuples. L7 (Application Layer) routes at HTTP level using headers, URLs, and payloads. ...

June 18, 2026 · 9 min · Tanh

Go System Design: CAP, PACELC & Clean Architecture Primer

Prerequisite: This is Part 1 of the System Design Masterclass series. Familiarity with basic distributed systems concepts and Go syntax is assumed. Answer-first: Sound system design thinking is fundamentally about evaluating and selecting trade-offs across performance, reliability, and cost. No system is perfect — architects optimize for the constraints imposed by real business requirements and technical realities. How Do You Build System Design Thinking? Answer-first: System design mastery is built on three pillars: mastering foundational theorems (CAP, PACELC), practicing trade-off analysis on real-world case studies, and repeatedly decomposing large problems into measurable, independently scalable components. ...

June 18, 2026 · 9 min · Tanh

Tech Radar 17/06: Kratos Clean Architecture & Dapr Pub/Sub

Welcome back to the Tech Radar bulletin. Last week we dissected how Kratos and Dapr v1.15 solve State Collisions via ETags. This week we go one layer deeper: how do you structure the entire codebase so that Kratos, Wire, and Dapr Pub/Sub compose cleanly — and how do you keep that architecture testable, resilient, and production-safe? 1. The Four Layers of Kratos Clean Architecture Answer-first: Kratos enforces a four-layer Clean Architecture — api, service, biz, and data — where business logic in biz is completely isolated from transport and infrastructure. Each layer communicates only with the layer adjacent to it, and only through interfaces. ...

June 17, 2026 · 6 min · Lê Tuấn Anh

Part 8: Zero-Downtime Map Updates & Multi-Region Kubernetes

Writing a fast algorithm is only half the battle. The true test of a Principal Engineer is deploying a massive, stateful Routing Engine to the Cloud without causing a single second of downtime during map updates or infrastructure failures. Answer-first: You cannot treat Graphhopper like a stateless web server. Updating the OpenStreetMap data takes 30 minutes of heavy computation. You MUST decouple the map build process using Kubernetes Jobs, inject the pre-computed 50GB cache via initContainers, and switch traffic instantly using Blue-Green Deployments. ...

June 15, 2026 · 5 min · Lê Tuấn Anh