Part 2: Agentic Ingestion & Multimodal Knowledge Graphs

1. The Fall of Traditional OCR: The “Garbage In, Garbage Out” Pain In Enterprise RAG architecture, the most ruthless formula is: Garbage In = Garbage Out. Before 2025, data engineers often used traditional OCR tools (like Tesseract, PyMuPDF) to extract text from PDF documents. The result was a disaster: Financial report table structures were shattered, data columns were merged together, and technical diagrams were completely ignored. When a Vector Database contains a messy, contextless heap of text (Context loss), no matter how powerful the LLM is, the answer you receive will only be a Hallucination. ...

May 17, 2026 · 4 min · Lê Tuấn Anh

GraphRAG vs Naive RAG: Enterprise Architecture Guide

Answer-first: Compare Naive RAG with GraphRAG for enterprise AI pipelines: knowledge graphs, LlamaIndex, chunking, streaming CDC, and security controls for dynamic data. Most RAG (Retrieval-Augmented Generation) implementations look the same: chunk documents, embed them into vectors, store them in a vector database, retrieve by cosine similarity, and inject the top-K chunks into the LLM context. This works for simple document Q&A. It fails systematically for enterprise knowledge bases where the answer to a question depends not on a single document chunk, but on the relationships between dozens of interconnected entities. ...

June 1, 2026 · 12 min · Lê Tuấn Anh