HNSW

In Part 2: Data Ingestion & Atomic Chunking - Bringing Product Data into the AI Environment, we established a clean data synchronization pipeline from PostgreSQL to Qdrant via Kafka CDC. But the journey of building a standard e-commerce search engine has just begun. When a user enters: “Asus ROG Zephyrus G14 laptop under $1500 in stock” If using purely Dense Vector Search: The system might return other Asus ROG Zephyrus laptops priced at $2000, or even older out-of-stock models, because the Embedding model only understands general semantic similarity and cannot process strict mathematical comparisons (Hard Filters like price < 1500 and in_stock = true). If using purely Lexical Search (BM25): The system fails when the user searches by intent, such as “thin and light high-performance gaming laptop”, because these keywords do not appear directly in the product description text. The optimal solution for e-commerce is Hybrid Search — combining Dense Search (semantic understanding), Sparse Search/BM25 (exact keyword and SKU matching), and Filterable HNSW (high-performance hard attribute filtering). ...