Hybrid AI Architecture & Self-Hosted vLLM | SLM Playbook

← Series hub ← Previous | Next → In the early phase of the AI wave (2023-2024), the default architecture for most startups and enterprises was API-Centric: routing every single request to OpenAI’s GPT-4 or Anthropic’s Claude. While highly convenient for proof-of-concept (PoC) phases, this model rapidly falls apart under production loads when encountering two massive walls: data privacy regulations and astronomical operational costs. By 2026, the rise of Small Language Models (SLMs) ranging from 2B to 14B parameters has dramatically shifted the landscape. Models such as Microsoft’s Phi-4 (14B), Qwen 2.5/3.5 Coder (7B/14B), and Llama 3 8B, when properly fine-tuned, achieve performance close to—or even exceeding—commercial frontier models on domain-specific, narrow tasks. ...

May 21, 2026 · 9 min · Lê Tuấn Anh

21-Service E-commerce Blueprint: Architecture & Traffic

Answer-first: Complete architectural blueprint of a Go 21-service e-commerce platform. Covers domain boundaries, traffic flow, and event-driven patterns. When transitioning from a monolithic platform to a distributed microservice setup, the hardest question isn’t “How do we write the code?” — it’s “How do these moving parts talk to each other safely, and why is each boundary drawn exactly where it is?” This post is the architectural anchor for the full composable commerce series. It presents the complete system blueprint and explains the reasoning behind each domain boundary. For deep-dives into specific layers, each section links to the dedicated post in the series. ...

April 12, 2026 · 7 min · Lê Tuấn Anh