Storing embeddings in a relational database like PostgreSQL is entirely feasible and, in many cases, preferable, especially when you're already managing structured data and need to combine similarity search with traditional filters, joins, or business logic.
That said, if you're not careful, embedding storage can quickly turn into a mess with schema sprawl, performance cliffs, update complexity, and general confusion over how to keep things maintainable and fast.
In this article, we’ll walk through how to design your schema, store vector embeddings efficiently, and query them in a way that won’t make you want to give up and run everything in a Python notebook

You can efficiently store, query, and manage vector embeddings in PostgreSQL for powerful similarity searches –without sacrificing performance or maintainability.
Choosing the right data type: Why vector wins
The first question people usually run into is:
What data type should I use to store embeddings?
If you’re using the pgvector extension, the best choice is to use the vector type provided by the extension itself. This gives you a fixed-length vector column, supports distance operators like <=> (cosine similarity), and can be indexed for approximate search.
You could, in theory, store an embedding as a float8[] array or a JSONB list, but doing so comes with trade-offs. Arrays and JSON are flexible but unindexed and slower to query. The vector type is opinionated, but in a good way. It's type-safe, faster, and integrates with indexing via IVFFlat, which is crucial for performance once you move past small-scale test datasets.
Note: IVFFlat (Inverted File with Flat quantization) is a type of approximate nearest neighbor (ANN) index designed for fast similarity search over high-dimensional vectors and is currently the only production-ready ANN index type available in pgvector.
Schema design: Embeddings inline or in a separate table?
Let’s say you’ve got a table of boat brokerage listings and you want to store descriptions as embeddings. You can either add a vector column to your existing table, or you can store the embeddings in a separate table and reference the original row by ID. Which model you choose depends on whether the embeddings are tightly coupled to the source data or need to be versioned, refreshed, or generated from multiple models over time.
If your embeddings are generated once and remain stable, putting them in the same table is the simplest approach. But if you're iterating on models, testing different chunking strategies, or dealing with embeddings that may be updated independently of the source record, a separate embeddings table makes things easier to manage and avoids locking or bloating your main table.
A clean setup for a separate table might look like this:
(
listing_uid integer not null
references production.listing (uid),
chunk_index integer not null,
chunk_text text,
embedding vector(1536),
primary key (listing_uid, chunk_index)
);
This allows you to split a long listing description into multiple embedding chunks, associate them cleanly, and keep track of their order for reassembly if needed. It also makes it easier to run batch updates, re-embed only certain chunks, and store multiple rows per document without overloading your base table.
Chunking strategy: Don’t let your embeddings fall apart
Chunking is a step people often underestimate, but it matters more than most realize. Poor chunking leads to noisy embeddings, degraded similarity scores, and inconsistent retrieval. One of the worst things you can do is chunk by fixed character count, because you’ll end up slicing sentences in half, destroying context. A better approach is to chunk by sentence, using something like nltk.sent_tokenize() or spaCy. That way, each embedding represents a full thought, which aligns better with how models learn semantics.
If you’re working with OpenAI models or anything that has a token limit (like 8192 tokens), you should chunk with token-awareness in mind. Use a tokenizer like tiktoken to estimate how many tokens each sentence or paragraph will take, and assemble chunks that fit comfortably within the model's limits, ideally ending on a sentence boundary. Some overlap between chunks (a sliding window of a few sentences) is also helpful when preserving continuity matters.
Practical example: Storing embeddings with Python and SQL
Below is one simple example of how we could embed these chunks into our table from within PostgreSQL. This utilizes a regular expression to chunk by sentence rather than tokenize to align with the model's token limits (for a reason we will come to shortly):
RETURNS text[] AS $$
import re
return re.split(r'(?<=[.!?])\s+', args[0])
$$ LANGUAGE plpython3u;
INSERT INTO production.listing_embeddings (listing_uid, chunk_index, chunk_text, embedding)
SELECT
l.uid AS listing_vid,
gs.chunk_index,
gs.sentence AS chunk_text,
ai.openai_embed('text-embedding-3-small', gs.sentence) AS embedding
FROM (
SELECT *
FROM production.listing
WHERE uid BETWEEN 1000 AND 1099 -- batch condition
) l
JOIN LATERAL (
SELECT
generate_series(1, array_length(sentences, 1)) - 1 AS chunk_index,
unnest(sentences) AS sentence
FROM (
SELECT split_sentences(l.description) AS sentences
) AS split
) AS gs ON true;
The case against in-database embedding at scale
However, in practice and particularly at scale, embedding text from within PostgreSQL comes with sharp edges.
First, there's the performance issue: each call to ai.openai_embed makes a synchronous API call to an external endpoint. If you're embedding hundreds or thousands of sentences, that quickly translates to hundreds or thousands of blocking HTTP calls, which tie up database backends and slow down your system considerably. PostgreSQL is designed to be a high-throughput transactional engine, not a client-side orchestrator for network-heavy LLM calls.
There’s also the problem of sandboxing. Extensions like plpython3u run in a restricted environment, and you'll often run into missing modules, path issues, or serialization limitations when trying to use third-party packages like nltk or openai.
For example, the below function to split descriptions into tokens would encounter a sandbox issue:
RETURNS TABLE(sentence TEXT) AS $$
import sys
sys.path.append('/Users/john/pg_python_libs')
import nltk
tokenizer = nltk.data.load('file:/Users/john/pg_python_libs/nltk_data/tokenizers/punkt/english.pickle')
return [(s,) for s in tokenizer.tokenize(input)]
$$ LANGUAGE plpython3u;
These problems are solvable, but not easily portable across environments or maintainable at scale. What starts as a neat trick in development becomes a source of fragility in production. This is why I ended up using a regex approach in the example above to chunk the descriptions.
Because of this, it’s generally more practical to perform embedding in Python outside the database and write the results back into PostgreSQL. This allows you to batch requests efficiently (significantly reducing latency and token costs), catch errors more gracefully, and keep your database focused on managing and serving data. You can still maintain tight integration between your vector store and your business data, you’re just decoupling the heavy lifting.
Querying with similarity
Once your embeddings are stored, you’ll want to query them using similarity search. pgvector supports three distance metrics:
- Euclidean: <->
- Inner product: <#>
- Cosine similarity: <=>
For textual data, cosine is usually the best choice because it’s scale-invariant. You can run queries like:
,l.name
,le.chunk_index
,le.embedding <=> (SELECT ai.openai_embed('text-embedding-3-small', 'performance cruiser')) AS distance
FROM production.listing_embeddings le
JOIN production.listing l 1..n<=>1: on l.uid = le.listing_uid
ORDER BY distance LIMIT 10
Indexing with IVFFlat
This works fine for small to mid-sized datasets, but the moment you’re searching through tens of thousands or more rows, you’ll notice it gets slow. That’s because it’s doing a brute-force linear scan. The fix here is to use an approximate index. pgvector supports IVFFlat indexing, which builds clusters of vectors and speeds up nearest neighbor search significantly. The trade-off is that it's approximate (not guaranteed to return the true closest vectors), but in most embedding use cases, it's good enough.
To build an IVFFlat index, you need to first decide how many lists (clusters) to use. You can experiment based on your dataset size, but a common starting point is around the square root of the number of vectors. Here’s how to create the index:
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
ANALYSE production.listing_embeddings
Keep in mind IVFFlat requires you to run an ANALYZE after creating the index before it becomes usable, and it only kicks in if you add a LIMIT clause to your query, otherwise Postgres won't consider using it. This can be an easy thing to miss and a frustrating gotcha if you're wondering why indexing didn't help.
,l.name
,le.chunk_index
,le.embedding <=> (SELECT ai.openai_embed('text-embedding-3-small', 'performance cruiser')) AS distance
FROM production.listing_embeddings le
JOIN production.listing l 1..n<=>1: on l.uid = le.listing_uid
ORDER BY distance LIMIT 10
Future-proofing your vector store with versioning embeddings
Versioning embeddings is also often overlooked early on but becomes essential as your system matures.
Embeddings are not static, they’re a product of the model used, the text preprocessing strategy, and even the chunking method. A change in any of these can shift the entire vector space, meaning similarity scores are no longer comparable across versions. If you’re experimenting with multiple models (say, comparing text-embedding-3-small with a domain-tuned MiniLM or E5 model), you’ll want to store embeddings separately or at least tag them clearly.
The simplest approach is to add a model version column to your embedding table and include it in your queries, but you can also go further by having versioned materialized views, separate indexes per model, or even embedding tables partitioned by version. The key is to avoid overwriting what you already have, and treat embeddings as reproducible artifacts tied to their generation method. That way, you preserve the ability to compare models, A/B test retrieval strategies, or roll back to a previous configuration without data loss. In regulated environments or systems where explainability matters, this becomes not just helpful, but necessary.
Wrapping up: PostgreSQL as a vector store done right
One of the best things about using PostgreSQL for embedding storage is that you can mix vector search with traditional SQL logic. Want to find the 10 most popular boats that are similar in design, but only for boats for sale in a certain region and listed in the last 2 months? That’s a simple WHERE clause. You don’t need to export anything to a separate vector store or deal with two inconsistent systems.
In short, storing and querying embeddings in PostgreSQL doesn't have to be painful. Use the vector type, separate your schema when needed, index smartly, and query with a mix of similarity and business filters. When done right, you get the power of vector search and the richness of SQL, without the complexity of bolting on an external system or fighting against your infrastructure. And best of all, you stay close to your data, where it belongs.
PostgreSQL's extensibility makes it possible to call out to AI models directly from within the database, and at first glance, it’s incredibly compelling. You can define a plpython3u function, load in the OpenAI SDK, and embed your text right from a SQL statement. This not only shows how PostgreSQL can integrate with modern AI workflows, but also allows you to experiment quickly, test ideas on small batches, and see results inline with your relational data, without needing to leave your SQL editor.
This is valuable, especially in proof of concept phases or when building demonstrations. The simplicity of running SELECT ai.openai_embed(…) inside your database highlights the expressive power of Postgres and its suitability as a platform for AI-native applications. It shows that AI isn't just something bolted on externally, but something that can be deeply woven into the data layer itself.