Part 2: Agentic Ingestion & Multimodal Knowledge Graphs

1. The Fall of Traditional OCR: The “Garbage In, Garbage Out” Pain In Enterprise RAG architecture, the most ruthless formula is: Garbage In = Garbage Out. Before 2025, data engineers often used traditional OCR tools (like Tesseract, PyMuPDF) to extract text from PDF documents. The result was a disaster: Financial report table structures were shattered, data columns were merged together, and technical diagrams were completely ignored. When a Vector Database contains a messy, contextless heap of text (Context loss), no matter how powerful the LLM is, the answer you receive will only be a Hallucination. ...

May 17, 2026 · 4 min · Lê Tuấn Anh

Tech Radar, April 27, 2026: Mistral Small 4 — One Open-Source Model to Rule Chat, Reasoning, and Agents

Mistral released Small 4 this week — a 119B parameter model that consolidates what previously required three separate models. Under the Apache 2.0 license and optimized for both latency and throughput, Small 4 represents a strategic inflection point in the open-source model ecosystem. The key innovation is not just technical performance. It is the unified architecture: Mistral has merged the capabilities of Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding) into a single model with configurable behavior. Users no longer switch between specialized models — they configure one model to deliver fast responses, deep reasoning, or visual analysis as the task demands. ...

April 27, 2026 · 6 min · Lê Tuấn Anh