In a previous blog post, we built a simple retrieval pipeline where we chunked the documentation, embedded the text, and queried PostgreSQL with vector similarity to find relevant passages. Then in the most recent post, we shifted gears and focused on the ingestion side and looked at extraction of facets such as version, operating system, document type, and component, and stored them alongside the embeddings in our database.
Now it’s time to bring those two threads together.