pgvector in Production: What the Tutorials Leave Out
When you read about pgvector, you get the idealized version: add the extension, store vectors, query by cosine similarity, done. Production use adds the parts the tutorials leave out — index tuning, embedding model choice, the hybrid search problem, and the edge cases that only appear when your knowledge base has been running long enough to accumulate real data.
Here is what actually using pgvector as the foundation of a knowledge retrieval system looks like after a few months of operation.
The Index Problem
pgvector supports two index types: IVFFlat and HNSW. The choice matters, and the tutorials don't always explain why.
IVFFlat (Inverted File with Flat compression) partitions the vector space into clusters and searches only the nearest clusters at query time. Fast, but approximate — it can miss the true nearest neighbor if the right entry happens to be in a cluster that wasn't searched. The lists parameter controls how many clusters you create; the probes parameter controls how many you search. Higher values improve recall at the cost of query time.
HNSW (Hierarchical Navigable Small World) builds a graph structure that allows faster approximate nearest-neighbor search with better recall characteristics than IVFFlat. It uses more memory and has slower build times, but query performance is better for most workloads.
For my knowledge base — a few thousand entries, frequently queried, not frequently updated — HNSW was the right choice:
CREATE INDEX knowledge_embedding_hnsw_idx
ON knowledge_entries
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
The m parameter controls the number of connections per node in the graph (more connections = better recall, more memory). ef_construction controls the size of the candidate list during index construction (higher = better quality index, slower build time). The defaults work; these values are tuned slightly toward recall quality over memory efficiency.
Hybrid Search: When to Use Which Mode
Vector similarity search and full-text search retrieve different things, and combining them is where practical retrieval quality improvement lives.
Vector search finds semantically related entries. Full-text search finds entries with specific words. For most knowledge queries, you want both: entries that are semantically related to your query and entries that contain the specific technical terms you're searching for.
The hybrid approach I settled on runs both searches and merges the results using Reciprocal Rank Fusion (RRF) — a simple method that combines ranked lists without requiring the scores to be on the same scale:
WITH vector_results AS (
SELECT id, content, metadata,
ROW_NUMBER() OVER (ORDER BY embedding <=> $1) as rank
FROM knowledge_entries
WHERE project_id = $2
ORDER BY embedding <=> $1
LIMIT 20
),
text_results AS (
SELECT id, content, metadata,
ROW_NUMBER() OVER (
ORDER BY ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $3)) DESC
) as rank
FROM knowledge_entries
WHERE project_id = $2
AND to_tsvector('english', content) @@ plainto_tsquery('english', $3)
LIMIT 20
),
combined AS (
SELECT
COALESCE(v.id, t.id) as id,
COALESCE(v.content, t.content) as content,
COALESCE(v.metadata, t.metadata) as metadata,
COALESCE(1.0 / (60 + v.rank), 0) +
COALESCE(1.0 / (60 + t.rank), 0) as rrf_score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
)
SELECT id, content, metadata
FROM combined
ORDER BY rrf_score DESC
LIMIT 5;
The constant 60 in the denominator is the standard RRF parameter — entries that appear in both result lists score higher than entries in only one. The top 5 results from this query are consistently better than the top 5 from either search mode alone.
The Embedding Quality Problem
Not all embeddings are equal for a given domain, and the quality difference shows up in retrieval. A general-purpose embedding model trained on web text may not represent technical domain vocabulary well. "IVFFlat" and "HNSW" might be close together in embedding space, or far apart, depending on how much the model has seen of vector database literature.
The practical test: take ten queries you actually run, run them against your knowledge base, and evaluate whether the top results are the ones you would have selected manually. If they're not, the embedding model is probably the culprit.
For my use case — data engineering domain knowledge with a mix of technical terminology and business domain vocabulary — the all-MiniLM-L6-v2 model performed well enough. The cases where it failed were edge cases where a technical term had a general English meaning that the model weighted too heavily. Hybrid search with keyword fallback caught most of those.
The Operational Reality
pgvector running in PostgreSQL is just a database. It has the operational characteristics of a database: it needs disk space, it benefits from appropriate autovacuum settings, it should be backed up. Nothing exotic, which is exactly the point.
Adding a vector store to a project that already runs Postgres means adding one extension, one table, one index. The operational overhead delta is negligible. The alternative — running a dedicated vector database service — adds another system to monitor, another service to restart when things go wrong, another API to integrate with. For a system at the scale I'm running, that overhead isn't justified by any capability advantage the dedicated service would provide.
The rule I've landed on: if you already run Postgres, pgvector is almost always the right choice for vector search at any scale where Postgres itself scales. You're not paying the operational cost of a separate vector database. You're not splitting your data across multiple storage systems. And you get joins — the ability to filter by structured metadata before the vector search, which is the most important performance optimization available.
If you're evaluating the dedicated vector database vs. pgvector question for your own setup, I'm happy to dig into the tradeoffs in your specific context. As always, I'm here to help.