Golang

Distributed Locks in Go — Redlock Math, etcd & Split-Brain

Prerequisite: Part 6 of the System Design Masterclass. Read Part 5: Kafka & Event-Driven to understand event sourcing patterns before tackling lock coordination. Answer-first: Distributed locks solve the mutual exclusion problem across independent servers — ensuring only one server can modify a shared resource at a time. Redis Redlock provides high-performance locking using majority quorum across multiple master nodes; etcd provides stronger guarantees via Raft consensus at the cost of higher latency. ...

Kafka Worker Pool in Go — Backpressure & Exactly-Once

Prerequisite: Part 5 of the System Design Masterclass. Read Part 4: Database Scaling to understand the storage tier that persisted events are written to. Answer-first: Event-Driven Architecture decouples services through asynchronous communication via a durable message log. In Go, goroutines and buffered channels implement natural backpressure — when consumers fall behind producers, the channel fills up and blocks the producer, throttling the ingest rate automatically. Kafka vs RabbitMQ — When to Use Each? Answer-first: Kafka is a distributed commit log — messages are retained indefinitely, consumers manage their own offsets, and replay is possible. RabbitMQ is a message broker — messages are deleted after acknowledgment, the broker handles routing complexity, push-based delivery. They solve different problems. ...

Database Sharding in Go — TiDB, PostgreSQL & Connection Pools

Prerequisite: Part 4 of the System Design Masterclass. Read Part 3: Caching Strategies to understand the cache layer before examining storage. Answer-first: Database sharding distributes data horizontally across independent partitions (shards) based on a shard key, reducing write contention and enabling linear storage growth. Choosing the wrong shard key leads to hot spots that can be worse than no sharding at all. Vertical vs Horizontal Scaling — When to Switch? Answer-first: Vertical scaling (scale-up) increases resources on a single server — simple but has a hard physical ceiling and non-linear cost growth. Horizontal scaling (scale-out) adds more servers — no theoretical ceiling, linear cost, but significantly higher operational complexity. ...

Caching Strategies in Go — Cache Stampede, XFetch & Redis LFU

Prerequisite: Part 3 of the System Design Masterclass. Read Part 2: Load Balancing L4/L7 to understand the traffic layer before diving into the caching tier. Answer-first: Effective caching strategy selection hinges on the acceptable consistency window and the read/write access pattern of the workload. Write-Through suits financial records; Write-Behind suits analytics and event counters; Cache-Aside is the default for read-heavy API responses. How Does Cache Stampede Happen? Answer-first: Cache Stampede (thundering herd) occurs when a popular cached key expires and multiple concurrent goroutines simultaneously detect a cache miss — then all query the database simultaneously. The burst of duplicate DB queries can exceed connection pool capacity and cause cascading failure. ...

Load Balancing L4/L7 in Go — DSR, Rate Limiting & API Gateway

Prerequisite: Part 2 of the System Design Masterclass. Read Part 1: System Design Thinking first to understand foundational trade-off frameworks. Answer-first: L4 load balancing routes traffic by transport-layer (IP/TCP/UDP) metadata — minimal CPU overhead but limited intelligence. L7 load balancing inspects HTTP headers, paths, and cookies — enables content-based routing and advanced health checks at the cost of higher processing overhead per request. L4 vs L7 Load Balancing — The Definitive Comparison Answer-first: The fundamental difference is where in the network stack the routing decision is made. L4 (Transport Layer) routes at TCP/UDP level using IP+port tuples. L7 (Application Layer) routes at HTTP level using headers, URLs, and payloads. ...

Go System Design: CAP, PACELC & Clean Architecture Primer

Prerequisite: This is Part 1 of the System Design Masterclass series. Familiarity with basic distributed systems concepts and Go syntax is assumed. Answer-first: Sound system design thinking is fundamentally about evaluating and selecting trade-offs across performance, reliability, and cost. No system is perfect — architects optimize for the constraints imposed by real business requirements and technical realities. How Do You Build System Design Thinking? Answer-first: System design mastery is built on three pillars: mastering foundational theorems (CAP, PACELC), practicing trade-off analysis on real-world case studies, and repeatedly decomposing large problems into measurable, independently scalable components. ...

Tech Radar 17/06: Kratos Clean Architecture & Dapr Pub/Sub

Welcome back to the Tech Radar bulletin. Last week we dissected how Kratos and Dapr v1.15 solve State Collisions via ETags. This week we go one layer deeper: how do you structure the entire codebase so that Kratos, Wire, and Dapr Pub/Sub compose cleanly â€” and how do you keep that architecture testable, resilient, and production-safe? 1. The Four Layers of Kratos Clean Architecture Answer-first: Kratos enforces a four-layer Clean Architecture â€” api, service, biz, and data â€” where business logic in biz is completely isolated from transport and infrastructure. Each layer communicates only with the layer adjacent to it, and only through interfaces. ...

Part 7: Load Testing and Performance Tuning for Production

Load testing is the final boss of System Design. A junior engineer runs a script, sees “20,000 RPS” with 0 errors, and assumes the system is ready. A Principal Engineer knows that unless you tune the Linux Kernel, bypass Coordinated Omission, and simulate realistic chaos, that number is a complete lie. Answer-first: Load testing a routing engine is not just about testing your Go code. It is a brutal stress test of the Linux Kernel network stack (sockets, TCP reuse, SOMAXCONN), the Go runtime scheduler, and the memory footprint of your load testing tool itself. ...

Part 6: Location Clustering with Uber H3 & Redis Semantic Caching

Caching an exact GPS coordinate is impossible. Because floating-point numbers are infinitely precise, two users standing 1 meter apart will have completely different coordinates (106.0001 vs 106.0002). If your Redis key is simply lat1,lng1:lat2,lng2, your Cache Hit Rate will forever remain at 0%. Answer-first: To survive massive scale, you must implement Semantic Caching. Instead of caching raw coordinates, use Uber H3 to “snap” coordinates into 100-meter hexagonal buckets. Your cache key becomes route:{h3_origin}:{h3_dest}. This instantly transforms a compute-heavy routing problem into a lightning-fast Redis memory lookup. ...

Part 4: Golang API & Microservices Integration (Kratos & Dapr)

Building a simple API that calls Graphhopper via http.Get is easy. Building a Principal-level API Gateway that survives 10,000 concurrent riders requesting routes without crashing is a masterclass in Distributed Systems. Answer-first: Graphhopper is a heavily CPU-bound downstream service. If your Golang API blindly accepts traffic and forwards it, a slight slowdown in Graphhopper will cause your Goroutines to pile up, exhausting your server’s RAM and triggering a cascading failure. You must implement a “Defense in Depth” strategy using Concurrency Bounding, Circuit Breakers, and Asynchronous Pub/Sub. ...

Part 2: Zero to Hero Environment Setup (Docker, OSM, Golang)

Setting up a local routing engine is notoriously difficult. Most generic tutorials offer a basic Docker command that crashes silently, leaving developers confused. In this guide, we bypass the basic “Hello World” setups. We will build a production-grade local environment integrating OpenStreetMap (OSM) data, a properly tuned Graphhopper (Java) Docker container, and a high-concurrency Golang API Gateway. 1. Downloading and Cropping Map Data Answer-first: Download raw OpenStreetMap data in .osm.pbf format from the Geofabrik server. To save gigabytes of RAM during local development, use osmium extract to crop the massive country-level map down to a single city bounding box. ...

Part 1: Core Routing Algorithms — A* & Dijkstra Visualized

When building a high-scale logistics or delivery system, generic algorithm tutorials often lead developers astray. They tell you that A* is universally better than Dijkstra. However, in the real world of Routing Engines and Distance Matrices, the truth is much more complex. In this first part of our masterclass, we will move beyond academic theory. We will visualize the exact lifecycle of a routing request—from snapping a GPS coordinate to the road, to bypassing traffic, and finally calculating routes in milliseconds using Contraction Hierarchies. ...

Tech Radar (14/06/2026): Kratos & Dapr State Management

Welcome back to the Tech Radar bulletin. In modern Microservices architecture, maintaining a system capable of communicating flexibly both externally (HTTP) and internally (gRPC) is an essential requirement. Simultaneously, State Management in distributed environments demands rigorous solutions to prevent data collisions. Today, we will dissect how to combine Go’s highly acclaimed Kratos framework with Dapr v1.15 to comprehensively solve this problem. 1. Kratos Dual-Protocol: HTTP & gRPC Running in Parallel Answer-first: The Kratos framework integrates with Dapr v1.15 State Management via the sidecar pattern, allowing HTTP and gRPC servers to run concurrently. To avoid state collisions when running dual-protocol, the system uses Dapr ETags via SaveStateWithETag for Optimistic Concurrency Control, and uses Middleware for Metadata synchronization. ...

Tech Radar (13/06/2026): Go 1.26 GC, K8s Pod Resizing & AI-Native

Welcome back to the Tech Radar bulletin, where we filter out the noise of the tech industry to uncover the genuine trends shaping future System Architecture. The second week of June 2026 witnessed three massive shifts, from core infrastructure (Go, Kubernetes) to the maturation of AI-Native architecture. From the perspective of a System Architect, these are updates you cannot ignore to optimize your High-Concurrency systems. 1. Golang 1.26: “Green Tea” GC Architecture - The Savior for RAM-Hungry Microservices Enabled by default in Go 1.26, the Garbage Collector codenamed “Green Tea” is not just a performance patch; it is a core architectural overhaul. ...

Go 1.26: Green Tea GC, Faster CGO & Goroutine Leak Detection

Answer-first: Go 1.26 ships three landmark runtime features: the Green Tea garbage collector (10–40% GC overhead reduction), ~30% faster cgo calls for AI inference bindings, and an experimental goroutine leak profile that detects permanently blocked goroutines via GC reachability analysis. What You’ll Learn That AI Won’t Tell You Performance metrics of garbage collection optimization in Go 1.26. Memory overhead trade-offs when calling CGO functions in high-throughput network threads. Released in February 2026, Go 1.26 is not a routine patch release. It fundamentally changes how the Go runtime manages memory, interacts with C code, and surfaces concurrency bugs. For teams running Golang microservices at scale, these improvements compound across a fleet — zero code changes required. ...

Go Microservices Architecture: Production Guide

Go microservices from domain design to Kubernetes deployment — gRPC, Dapr, OpenTelemetry, and GitOps patterns from a real 21-service production migration.

Golang gRPC Microservices: Protobuf, TLS & Middleware

Answer-first: Optimize inter-service communication in Go microservices using gRPC and Protobuf, delivering 3-10× smaller payloads and sub-millisecond latencies compared to REST. Secure communication channels with mutual TLS (mTLS), handle cross-cutting concerns using custom interceptor middleware, and implement native gRPC health checking for container readiness probes. What You’ll Learn That AI Won’t Tell You Optimizing Protobuf serialization overhead in Go-based gRPC microservices. How to set up connection keep-alive parameters to prevent TCP connection drops during peak load. Why gRPC for Go Microservices? The key advantages over REST: ...

Tech Radar 11/06: K8s Pod Resizing & Go 1.26

Welcome to today’s Tech Radar. The theme for this week is the maturation of the infrastructure layer. We are seeing Kubernetes finally adapt to the erratic resource demands of AI inference, a shift towards proactive “Machine Economy” agents, and Golang cementing its position as the ultimate orchestration language for local AI. Here are the signals you need to pay attention to. 1. Kubernetes: The Operating System for AI Platforms The shift of Kubernetes from a general-purpose microservices orchestrator to the de facto “AI OS” is fully cemented this week by two critical General Availability (GA) milestones: ...

MySQL Scalability: Read Replicas, Sharding & TiDB

Answer-first: MySQL scalability is the practical throughput ceiling of your database at each resource level. A single tuned InnoDB instance delivers 100–500 TPS at baseline, scaling to 6,000–10,000+ TPS with connection pooling, read replicas, and optimal hardware. Beyond that, write-scaling requires sharding or a distributed SQL layer. What You’ll Learn That AI Won’t Tell You Tuning InnoDB buffer pool size for high read/write ratio workloads. Why standard read replication fails to solve write bottlenecks and when toshard. MySQL scalability is the ability to increase database throughput — reads per second, writes per second, or data volume — without rewriting your application. The critical distinction: read scaling (adding replicas) and write scaling (sharding or distributed SQL) require completely different architectural approaches. Choosing the wrong path creates technical debt that takes months to unwind. ...

Chapter 9: Database Sharding & Read/Write Splitting

← Previous | Series hub Chapter 9: Scaling the Final Database Bottleneck When your application reaches tens of millions of users, the Database becomes the ultimate bottleneck. CPU maxes out at 100%, RAM depletes, and queries take seconds instead of milliseconds. This is the stage where you must deploy distributed database strategies. 1. Read/Write Splitting Answer-first: Because 80% of traffic is Read-only, separate your DB into a Write Master and Read Slaves. Use GORM’s dbresolver plugin to route queries automatically without altering business logic. ...

Chapter 8: Distributed Locking — Redlock vs ZooKeeper

← Previous | Series hub | Next → Chapter 8: Synchronizing Clusters with Distributed Locks In a standalone Go application, preventing two Goroutines from overwriting the same data (Race Condition) is achieved via sync.Mutex. However, when your system scales out to 10 servers behind a Load Balancer, sync.Mutex is useless because it only locks local RAM. You need a Distributed Lock. 1. Basic Redis Locks Answer-first: A basic Redis lock utilizes SET resource id NX PX ttl. It works for simple caching but suffers from Single Point of Failure vulnerabilities if the Redis Master crashes before syncing. ...

Chapter 7: Designing Idempotency APIs for Payment Systems

← Previous | Series hub | Next → Chapter 7: Fortifying Payment Systems with Idempotent APIs In E-commerce or Fintech, the ultimate nightmare is not a system crash, but charging a customer twice for a single order. This is usually caused by network lag, an impatient user double-clicking “Pay”, or automated app retry logic. The mandatory solution for any transactional API (Payment/Order) is Idempotency. 1. What is Idempotency? Answer-first: An operation is idempotent if executing it once or N times yields the exact same system state and outcome. While GET and PUT are natively idempotent, POST requires explicit engineering. ...

Chapter 5: Optimizing Golang Database Connection Pools

← Previous | Series hub | Next → Chapter 5: Unlocking Database Performance via Connection Pooling If your Golang system processes business logic blazingly fast but chokes at the Database layer, 90% of the time, it is due to an incorrectly configured *sql.DB. 1. Understanding *sql.DB Answer-first: In Golang, sql.Open() does NOT create a direct database connection. It instantiates a thread-safe Connection Pool manager. You must initialize the db variable only once during app startup. ...

Chapter 4: Solving the Dual-Write Problem with Transactional Outbox Pattern

← Previous | Series hub | Next → Chapter 4: Eliminating the Dual-Write Nightmare When your Golang application migrates from a Monolith to Event-Driven Microservices, you will immediately face an architectural nightmare: the Dual-Write Problem. 1. What is the Dual-Write Problem? Answer-first: Dual-Write occurs when an app attempts to write to a Database and publish to a Message Broker (Kafka) simultaneously. Without a distributed transaction, network failures will cause the two systems to fall out of sync. ...

Chapter 3: Distributed Rate Limiting with Redis & GCRA Algorithm

← Previous | Series hub | Next → Chapter 3: Securing APIs with Distributed Rate Limiting If caching is the shield protecting your database, Rate Limiting is the armor guarding your API servers from DDoS attacks and resource exhaustion caused by abusive clients. Why Local Rate Limiting Fails in Microservices Answer-first: Local RAM limiters fail because Load Balancers distribute traffic across multiple nodes. A user allowed 100 req/sec can exploit a 5-node cluster by sending 500 req/sec, bypassing the intended limit. Centralized state via Redis is required. ...

Chapter 2: The 3 Caching Vulnerabilities (Penetration, Breakdown, Avalanche) & Go Singleflight

← Previous | Series hub | Next → Chapter 2: The 3 Deadliest Cache Vulnerabilities Caching is the ultimate shield for databases in distributed systems. However, poorly implemented caches can become the exact reason your system crashes. In this chapter, we dissect three classic caching phenomenons and how to defend against them using Golang. 1. Cache Penetration Answer-first: Cache penetration occurs when attackers query non-existent IDs, bypassing the cache entirely. Defend against it by caching NULL values or utilizing Bloom Filters at the memory level. ...

Chapter 1: How Systems Handle Millions of Requests/s (C10M)? Lessons from Shopee & Alipay

← Series hub Next → Chapter 1: Overcoming the C10M Barrier To build a system capable of handling millions of Requests Per Second (RPS) — known as the C10M problem — vertical scaling is never enough. It requires a meticulously designed Distributed Architecture. 1. The Shift from C10K to C10M Answer-first: While C10K was solved by non-blocking I/O (like NGINX), C10M shifts the bottleneck to the OS kernel. Systems must bypass the kernel using DPDK or XDP to handle 10 million connections efficiently. ...

The Reality of C10M: Surviving Extreme Traffic — Exec Summary

Despite the massive advancements in cloud computing, enterprise applications facing explosive traffic growth inevitably hit a brutal wall: the Database and the Network layer. The root cause lies not in the hardware, but in the Architecture. We attempt to solve the “Millions of Requests per Second” (C10M) problem by simply throwing more servers at it (Vertical/Horizontal Scaling), only to realize that stateful bottlenecks, cache stampedes, and dual-write inconsistencies bring the entire cluster to its knees. ...

Go pprof in Kubernetes: CPU & Memory Profiling

Answer-first: Go pprof is the standard library profiling tool for diagnosing CPU usage, memory allocation, and goroutine leaks in production Go services, with safe exposure via internal HTTP endpoints and minimal performance overhead when configured correctly. What You’ll Learn That AI Won’t Tell You Reading memory profiles to identify slow allocations in performance hot paths. Analyzing flame graphs to detect lock contention on global mutexes. Prerequisite: This guide covers how to profile and diagnose complex performance issues in production. If you are specifically dealing with unbounded goroutine growth, ensure you first understand the foundational concepts in Goroutine Leak Detection and Fix in Production Go Services. ...

Vitess vs GORM Sharding: MySQL Write Scaling in Go

Answer-first: Vitess vs GORM Sharding for MySQL write scaling: VReplication zero-downtime vs. application-level sharding — ErrMissingShardingKey tradeoffs in Go. What You’ll Learn That AI Won’t Tell You Designing database sharding keys that prevent cross-shard joins. Configuring proxy routing layers like Vitess to scale MySQL queries horizontally. When your application reaches millions of users, a single database instance will inevitably become the biggest bottleneck in your entire architecture. To solve this, MySQL database scaling becomes mandatory. You must Scale DB for Microservices using Horizontal Scaling techniques. ...