PGVector: Vector Embeddings in PostgreSQL | Generated by AI
PGVector is a PostgreSQL extension that adds support for storing and querying vector embeddings (arrays of numbers, often 128–4096 dimensions) inside a Postgres database.
It’s commonly used in AI and machine learning applications where you need to store and search embeddings generated by models like OpenAI, Hugging Face, or other embedding providers. For example, you might embed documents, code snippets, or product descriptions into vectors, then use PGVector to quickly find the most similar ones to a user’s query.
Key Features
- Vector data type: Adds a
vector
column type to Postgres tables. - Similarity search: Supports operations like cosine similarity, Euclidean distance, or inner product to compare vectors.
- Indexes for speed: Provides
ivfflat
andhnsw
index types for efficient approximate nearest neighbor (ANN) search, allowing fast retrieval even with millions of embeddings. - SQL integration: Because it runs inside Postgres, you can combine vector search with normal SQL queries, joins, and filters.
Example Usage
-- Create table with vector column
CREATE TABLE items (
id bigserial PRIMARY KEY,
embedding vector(1536) -- dimension must match your model
);
-- Insert a row
INSERT INTO items (embedding) VALUES ('[0.25, 0.1, ...]');
-- Search for most similar embedding
SELECT id, embedding
FROM items
ORDER BY embedding <-> '[0.24, 0.11, ...]'
LIMIT 5;
Here <->
is the similarity operator.
Why It Matters
PGVector lets you build an entire vector database directly in Postgres, rather than using a separate specialized vector DB (like Pinecone, Weaviate, or Milvus). That’s attractive if you already use Postgres and want to keep all data and queries in one place.
Do you want me to also compare PGVector vs dedicated vector databases (like Pinecone or Weaviate) so you can see when to use which?