Part 2 — State, Memory & Context Management

Prerequisite: To firmly grasp the foundational concepts of Memory Architecture in AI systems, please review Comprehensive AI-Native System Architecture. After solving the Agent communication challenge in Part 1, we must face the LLM’s greatest enemy: Context Window limits. Even the best Orchestrator is useless if Worker Agents forget the User’s initial request after just a few tool-calling turns. 2.1. The Context Window Problem and Why Agents “Forget” Large Language Models (LLMs) are inherently Stateless. Every time you send a prompt, the LLM rereads the entire text from beginning to end. ...

May 17, 2026 · 5 min · Lê Tuấn Anh

Qdrant Hybrid Search: Solving Semantic and Hard Filters

In Part 2: Data Ingestion & Atomic Chunking - Bringing Product Data into the AI Environment, we established a clean data synchronization pipeline from PostgreSQL to Qdrant via Kafka CDC. But the journey of building a standard e-commerce search engine has just begun. When a user enters: “Asus ROG Zephyrus G14 laptop under $1500 in stock” If using purely Dense Vector Search: The system might return other Asus ROG Zephyrus laptops priced at $2000, or even older out-of-stock models, because the Embedding model only understands general semantic similarity and cannot process strict mathematical comparisons (Hard Filters like price < 1500 and in_stock = true). If using purely Lexical Search (BM25): The system fails when the user searches by intent, such as “thin and light high-performance gaming laptop”, because these keywords do not appear directly in the product description text. The optimal solution for e-commerce is Hybrid Search — combining Dense Search (semantic understanding), Sparse Search/BM25 (exact keyword and SKU matching), and Filterable HNSW (high-performance hard attribute filtering). ...

May 22, 2026 · 7 min · Vesviet Team

GraphRAG vs Naive RAG: Enterprise Architecture Guide

Answer-first: Compare Naive RAG with GraphRAG for enterprise AI pipelines: knowledge graphs, LlamaIndex, chunking, streaming CDC, and security controls for dynamic data. Most RAG (Retrieval-Augmented Generation) implementations look the same: chunk documents, embed them into vectors, store them in a vector database, retrieve by cosine similarity, and inject the top-K chunks into the LLM context. This works for simple document Q&A. It fails systematically for enterprise knowledge bases where the answer to a question depends not on a single document chunk, but on the relationships between dozens of interconnected entities. ...

June 1, 2026 · 12 min · Lê Tuấn Anh

Architecting Agentic E-commerce Search with Golang

The search system is the beating heart of every e-commerce platform. If customers cannot find a product, they cannot buy it. However, as we move through 2026, user search behavior has evolved drastically from typing short, abrupt keywords (e.g., “men’s running shoes”) to submitting complex, goal-oriented queries (e.g., “find me a pair of men’s waterproof trail running shoes, size 42, under $100, that can be delivered by tomorrow”). Against these multifaceted intents, traditional search engines begin to show their limitations. ...

May 22, 2026 · 8 min · Lê Tuấn Anh