21. LNPP + pgvector Vector Search Stack#
flowchart LR
Client[Client] --> Nginx[Nginx<br/>API Gateway]
Nginx --> FastAPI[FastAPI<br/>Python App]
FastAPI --> PG[(PostgreSQL<br/>+ pgvector)]
FastAPI --> Embed[Embedding Model<br/>OpenAI / Local]
subgraph RAG["RAG Pipeline"]
Doc[Documents] --> Chunk[Chunking]
Chunk --> Embed
Embed --> PG
end
Overview#
pgvector is a PostgreSQL extension for vector similarity search, perfect for AI/RAG applications. Combined with Nginx + PostgreSQL + Python (FastAPI), it forms a powerful stack for semantic search and AI-powered applications.
PostgreSQL + pgvector Setup#
Docker Compose#
services:
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: app
POSTGRES_PASSWORD: ${PG_PASSWORD:-apppass}
volumes:
- pg_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d vectordb"]
interval: 5s
timeout: 3s
retries: 5
app:
build: ./app
environment:
- DATABASE_URL=postgresql+asyncpg://app:${PG_PASSWORD:-apppass}@postgres:5432/vectordb
- OPENAI_API_KEY=${OPENAI_API_KEY}
ports:
- "8000:8000"
depends_on:
postgres:
condition: service_healthy
nginx:
image: nginx:1.25-alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
depends_on:
- app
volumes:
pg_data:
Initialize pgvector#
-- init.sql
CREATE EXTENSION IF NOT EXISTS vector;
-- Documents table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536), -- OpenAI ada-002 dimension
created_at TIMESTAMP DEFAULT NOW()
);
-- HNSW index (recommended for most cases)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- IVFFlat index (alternative, faster build)
-- CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);
Vector Search Queries#
-- Cosine similarity search (most common)
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- L2 distance search
SELECT id, title, content
FROM documents
ORDER BY embedding <-> $1::vector
LIMIT 10;
-- Inner product search
SELECT id, title, content
FROM documents
ORDER BY embedding <#> $1::vector
LIMIT 10;
-- Filtered search
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'tech'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1::vector
LIMIT 10;
Index Comparison#
Index |
Build Speed |
Query Speed |
Memory |
Best For |
|---|---|---|---|---|
HNSW |
Slow |
Fast |
High |
Production, < 10M vectors |
IVFFlat |
Fast |
Medium |
Low |
Large datasets, frequent rebuilds |
None (exact) |
N/A |
Slow |
Low |
Small datasets (< 10K) |
FastAPI + pgvector Application#
# app/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
from sqlalchemy import text
import openai
import os
app = FastAPI(title="Vector Search API")
DATABASE_URL = os.getenv("DATABASE_URL")
engine = create_async_engine(DATABASE_URL)
async_session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
client = openai.AsyncOpenAI()
class DocumentCreate(BaseModel):
title: str
content: str
metadata: dict = {}
class SearchQuery(BaseModel):
query: str
top_k: int = 5
async def get_embedding(text: str) -> list[float]:
response = await client.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return response.data[0].embedding
@app.post("/documents")
async def create_document(doc: DocumentCreate):
embedding = await get_embedding(doc.content)
async with async_session() as session:
await session.execute(
text("""
INSERT INTO documents (title, content, metadata, embedding)
VALUES (:title, :content, :metadata, :embedding)
"""),
{
"title": doc.title,
"content": doc.content,
"metadata": doc.metadata,
"embedding": str(embedding),
}
)
await session.commit()
return {"status": "created"}
@app.post("/search")
async def search(query: SearchQuery):
embedding = await get_embedding(query.query)
async with async_session() as session:
result = await session.execute(
text("""
SELECT id, title, content, metadata,
1 - (embedding <=> :embedding::vector) AS similarity
FROM documents
ORDER BY embedding <=> :embedding::vector
LIMIT :top_k
"""),
{"embedding": str(embedding), "top_k": query.top_k}
)
rows = result.fetchall()
return [
{
"id": r.id,
"title": r.title,
"content": r.content,
"similarity": round(r.similarity, 4),
}
for r in rows
]
@app.get("/health")
async def health():
return {"status": "ok"}
Dockerfile#
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN useradd -r appuser
USER appuser
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
requirements.txt#
fastapi>=0.110
uvicorn[standard]>=0.27
sqlalchemy[asyncio]>=2.0
asyncpg>=0.29
openai>=1.12
pgvector>=0.2
K8s Deployment#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-vector
spec:
serviceName: postgres-vector
replicas: 1
selector:
matchLabels:
app: postgres-vector
template:
metadata:
labels:
app: postgres-vector
spec:
containers:
- name: postgres
image: pgvector/pgvector:pg16
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: vectordb
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pg-secret
key: password
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
Performance Tips#
Use HNSW index for production (better recall, faster queries)
Set
maintenance_work_memhigh during index builds:SET maintenance_work_mem = '2GB'Use
shared_buffers = 25%of available RAMBatch inserts for bulk loading (use
COPYor batchINSERT)Monitor index size:
SELECT pg_size_pretty(pg_relation_size('documents_embedding_idx'))