Go Observability & pprof — Memory Leaks, CPU Profiling & GODEBUG

Prerequisite: This is Part 10 of the System Design Masterclass. Previous parts built the architecture — this part teaches you how to see inside a running system and diagnose production performance issues. Answer-first: Go’s built-in pprof profiler provides CPU sampling, heap allocation analysis, goroutine stack inspection, and blocking profiler — all available as HTTP endpoints in running production services with minimal overhead. Heap diff between two snapshots is the fastest way to identify memory leaks. ...

June 18, 2026 · 9 min · Tanh

Go pprof in Kubernetes: Remote CPU & Memory Profiling Without Restarting Pods

Prerequisite: This guide covers how to profile and diagnose complex performance issues in production. If you are specifically dealing with unbounded goroutine growth, ensure you first understand the foundational concepts in Goroutine Leak Detection and Fix in Production Go Services. Performance degradation in production is inevitable. When a Go microservice suddenly spikes to 90% CPU utilization or triggers an Out-Of-Memory (OOM) kill in Kubernetes, guessing the root cause by staring at the code is rarely effective. You need data. ...

June 2, 2026 · 10 min · Lê Tuấn Anh

Go pprof in Kubernetes: Remote Profiling & Flame Graphs

Answer-first: How to safely profile CPU, memory, and goroutines in Go services running in Kubernetes using kubectl port-forward, pprof, and Pyroscope. You’ve instrumented your Go service with net/http/pprof, run go tool pprof locally against the development binary, and spotted the hot path in your flame graph. Then you deploy to Kubernetes and the bottleneck disappears — because the workload profile in Kubernetes differs from local testing (different request mix, connection pool pressure, GC behavior under actual memory pressure, scheduler interference from co-located pods). ...

June 1, 2026 · 13 min · Lê Tuấn Anh

Goroutine Leak Detection and Fix in Production Go Services

Answer-first: Learn how to detect, diagnose, and fix goroutine leaks in production Go microservices using pprof, goleak, and the new Go 1.26 goroutineleak profile. A Kubernetes pod abruptly restarts with exit code 137. The memory metrics dashboard shows a slow, perfectly linear staircase pattern stretching over three days. There are no panic logs in stdout, no database errors, and no abnormal CPU spikes. Just a slow, silent OOM (Out Of Memory) death. ...

May 26, 2026 · 15 min · Lê Tuấn Anh