Part 9: Agentic Observability - Monitoring & Debugging the AI's Train of Thought

1. The “Black Box” Problem & The Incompetence of Traditional APM In traditional software systems (Web/App), you can use APM (Application Performance Monitoring) tools like Datadog or New Relic for monitoring. If the system returns an HTTP 200 OK code, you know everything is working fine. If it returns HTTP 500, you open the Log to see which line of code failed. But with AI Agents, this logic completely collapses. An Agentic system can swiftly return an HTTP 200 OK, without throwing any Exceptions, yet the returned content could be flawed financial advice (Hallucination) that costs the company millions of dollars. ...

May 17, 2026 · 4 min · Lê Tuấn Anh

Goroutine Leak Detection and Fix in Production Go Services

Answer-first: Learn how to detect, diagnose, and fix goroutine leaks in production Go microservices using pprof, goleak, and the new Go 1.26 goroutineleak profile. A Kubernetes pod abruptly restarts with exit code 137. The memory metrics dashboard shows a slow, perfectly linear staircase pattern stretching over three days. There are no panic logs in stdout, no database errors, and no abnormal CPU spikes. Just a slow, silent OOM (Out Of Memory) death. ...

May 26, 2026 · 15 min · Lê Tuấn Anh