
In a previous blog post, we built a simple retrieval pipeline where we chunked the documentation, embedded the text, and queried PostgreSQL with vector similarity to find relevant passages. Then in the most recent post, we shifted gears and focused on the ingestion side and looked at extraction of facets such as version, operating system, document type, and component, and stored them alongside...

In my previous blog post, we built the simplest possible Retrieval Augmented Generation (RAG) pipeline inside PostgreSQL. We embedded our manuals, stored those vectors in a table, ran a similarity search, and handed the top 5 results straight to a Large Language Model. The result was encouraging, we could already see the model drawing on our content rather than inventing information. But as with...

Embeddings are the foundation of vector search, allowing us to represent meaning-rich content like documents or queries as numerical vectors. But to use them effectively, it’s essential to understand what’s actually being embedded—whether that’s individual words, full sentences, or larger chunks of text.

As vector search becomes a foundational feature in modern applications—from semantic search and recommendation engines to AI-driven insights—developers are increasingly adopting PostgreSQL with the pgvector extension. However, one concept often creates confusion: the difference between similarity and distance.