Chapter 3: Distributed Rate Limiting with Redis & GCRA Algorithm

← Previous | Series hub | Next → Chapter 3: Securing APIs with Distributed Rate Limiting If caching is the shield protecting your database, Rate Limiting is the armor guarding your API servers from DDoS attacks and resource exhaustion caused by abusive clients. Why Local Rate Limiting Fails in Microservices Answer-first: Local RAM limiters fail because Load Balancers distribute traffic across multiple nodes. A user allowed 100 req/sec can exploit a 5-node cluster by sending 500 req/sec, bypassing the intended limit. Centralized state via Redis is required. ...

June 9, 2026 · 3 min · Lê Tuấn Anh

Chapter 2: The 3 Caching Vulnerabilities (Penetration, Breakdown, Avalanche) & Go Singleflight

← Previous | Series hub | Next → Chapter 2: The 3 Deadliest Cache Vulnerabilities Caching is the ultimate shield for databases in distributed systems. However, poorly implemented caches can become the exact reason your system crashes. In this chapter, we dissect three classic caching phenomenons and how to defend against them using Golang. 1. Cache Penetration Answer-first: Cache penetration occurs when attackers query non-existent IDs, bypassing the cache entirely. Defend against it by caching NULL values or utilizing Bloom Filters at the memory level. ...

June 9, 2026 · 3 min · Lê Tuấn Anh

Chapter 1: How Systems Handle Millions of Requests/s (C10M)? Lessons from Shopee & Alipay

← Series hub Next → Chapter 1: Overcoming the C10M Barrier To build a system capable of handling millions of Requests Per Second (RPS) — known as the C10M problem — vertical scaling is never enough. It requires a meticulously designed Distributed Architecture. 1. The Shift from C10K to C10M Answer-first: While C10K was solved by non-blocking I/O (like NGINX), C10M shifts the bottleneck to the OS kernel. Systems must bypass the kernel using DPDK or XDP to handle 10 million connections efficiently. ...

June 9, 2026 · 3 min · Lê Tuấn Anh

The Reality of C10M: Surviving Extreme Traffic — Exec Summary

Despite the massive advancements in cloud computing, enterprise applications facing explosive traffic growth inevitably hit a brutal wall: the Database and the Network layer. The root cause lies not in the hardware, but in the Architecture. We attempt to solve the “Millions of Requests per Second” (C10M) problem by simply throwing more servers at it (Vertical/Horizontal Scaling), only to realize that stateful bottlenecks, cache stampedes, and dual-write inconsistencies bring the entire cluster to its knees. ...

June 9, 2026 · 3 min · Lê Tuấn Anh

Go pprof in Kubernetes: Remote CPU & Memory Profiling Without Restarting Pods

Prerequisite: This guide covers how to profile and diagnose complex performance issues in production. If you are specifically dealing with unbounded goroutine growth, ensure you first understand the foundational concepts in Goroutine Leak Detection and Fix in Production Go Services. Performance degradation in production is inevitable. When a Go microservice suddenly spikes to 90% CPU utilization or triggers an Out-Of-Memory (OOM) kill in Kubernetes, guessing the root cause by staring at the code is rarely effective. You need data. ...

June 2, 2026 · 10 min · Lê Tuấn Anh

Vitess vs GORM Sharding: MySQL Write Scaling in Go (ErrMissingShardingKey Deep-Dive)

Answer-first: Vitess vs GORM Sharding for MySQL write scaling: VReplication zero-downtime vs. application-level sharding — ErrMissingShardingKey tradeoffs in Go. When your application reaches millions of users, a single database instance will inevitably become the biggest bottleneck in your entire architecture. To solve this, MySQL database scaling becomes mandatory. You must Scale DB for Microservices using Horizontal Scaling techniques. This article delves into the differences between scaling methods and compares the two most popular Sharding architectures today: Middleware-level Sharding (Vitess) and Application-level Sharding in Go (GORM Sharding plugin). ...

June 1, 2026 · 6 min · Lê Tuấn Anh

Goroutine Leak Detection and Fix in Production Go Services

Answer-first: Learn how to detect, diagnose, and fix goroutine leaks in production Go microservices using pprof, goleak, and the new Go 1.26 goroutineleak profile. A Kubernetes pod abruptly restarts with exit code 137. The memory metrics dashboard shows a slow, perfectly linear staircase pattern stretching over three days. There are no panic logs in stdout, no database errors, and no abnormal CPU spikes. Just a slow, silent OOM (Out Of Memory) death. ...

May 26, 2026 · 15 min · Lê Tuấn Anh

Architecting Agentic E-commerce Search with Golang

The search system is the beating heart of every e-commerce platform. If customers cannot find a product, they cannot buy it. However, as we move through 2026, user search behavior has evolved drastically from typing short, abrupt keywords (e.g., “men’s running shoes”) to submitting complex, goal-oriented queries (e.g., “find me a pair of men’s waterproof trail running shoes, size 42, under $100, that can be delivered by tomorrow”). Against these multifaceted intents, traditional search engines begin to show their limitations. ...

May 22, 2026 · 8 min · Lê Tuấn Anh

Tech Radar, May 10, 2026: Go 1.26 'Green Tea' GC, Kubernetes as AI OS, and Agentic Engineering

In the last 24 hours, the engineering landscape has seen a strong convergence of performance optimization and intelligent orchestration. The signals today emphasize that the foundational layers (languages and orchestrators) are evolving specifically to handle the next generation of AI and high-concurrency workloads. For platform engineers and backend developers, today’s radar translates these high-level shifts into actionable TechTask priorities: upgrading to Go 1.26 for immediate memory efficiency, re-evaluating Kubernetes cluster design for AI workloads, and exploring agent-driven automation in deployment pipelines. ...

May 10, 2026 · 4 min · Lê Tuấn Anh

Tech Radar, May 2, 2026: 24-Hour TechTask Signals - Commerce Modernization Is Becoming an Operations Problem

The strongest TechTask signal in the last 24 hours is not a single framework release. It is the way several platform updates are converging on the same message: commerce modernization is no longer mainly about decomposing a monolith. It is about operating the decomposed system safely. That matters directly for the engineering profile behind this site: Strangler Fig migration from Magento/PHP into a 21-service Golang ecosystem, Dapr Pub/Sub for distributed workflows, Saga compensation for checkout and payment failure, Transactional Outbox for reliable events, GitOps through Kubernetes and ArgoCD, and performance work that pushed p95 latency from 1.2s to 120ms under high-traffic commerce load. ...

May 2, 2026 · 8 min · Lê Tuấn Anh