Part 9: Agentic Observability - Monitoring & Debugging the AI's Train of Thought

1. The “Black Box” Problem & The Incompetence of Traditional APM In traditional software systems (Web/App), you can use APM (Application Performance Monitoring) tools like Datadog or New Relic for monitoring. If the system returns an HTTP 200 OK code, you know everything is working fine. If it returns HTTP 500, you open the Log to see which line of code failed. But with AI Agents, this logic completely collapses. An Agentic system can swiftly return an HTTP 200 OK, without throwing any Exceptions, yet the returned content could be flawed financial advice (Hallucination) that costs the company millions of dollars. ...

May 17, 2026 · 4 min · Lê Tuấn Anh

Part 10: Production Evals & CI/CD for AI - The Final Checkpoint

1. The End of the “Vibe Check” Era A few years ago, the process of testing an AI system went like this: The programmer tweaks the Prompt file, types a few questions into the chatbox, skims through to see if the AI’s answer sounds reasonable (vibe check), shouts “Looks Good To Me” (LGTM), and hits Deploy to Production. In 2026, this approach is considered catastrophic. AI is a Non-deterministic system. Today it answers correctly, but tomorrow if you change just 1 word in the Prompt or switch to a new LLM version, it might hallucinate in a corner you never tested. To deploy AI for enterprise service, you must transition from intuitive testing to statistical probability testing. ...

May 17, 2026 · 4 min · Lê Tuấn Anh