AI Stack Layer · 4 of 8

Vector Databases

Storage and search for embeddings — the high-dimensional vectors that capture the meaning of text, images, and audio. The memory of every RAG system.

EmbeddingsANN searchSemantic similarityRAG backboneLayer 4
← Back to AI Landscape
Quick Facts

At a Glance

Basic Concepts

  • Embedding: a vector (e.g. 1024 floats) representing the meaning of some content.
  • Similarity = distance between two vectors (cosine, dot product, Euclidean).
  • ANN (Approximate Nearest Neighbor): fast similarity search across millions of vectors.
  • Index: a data structure (HNSW, IVF, ScaNN) that makes search sub-linear.
  • Hybrid search combines vector similarity with keyword (BM25) matching for the best of both.
Landscape

The Major Options

DatabaseTypeWhy pick it
PineconeManaged SaaSEasiest production setup; auto-scaling, no ops.
WeaviateOpen-source / cloudBuilt-in vectorizer modules; GraphQL API.
QdrantOpen-source / cloudRust-built; fast filtering + payloads.
Milvus / ZillizOpen-source / cloudBattle-tested at billion-vector scale.
ChromaOpen-source / liteLocal-first; favorite for prototyping.
pgvector (Postgres)ExtensionReuse your existing SQL DB; transactions + vectors.
Elasticsearch / OpenSearchSearch engine + vectorsHybrid search; you may already run it.
Redis (RediSearch)In-memory + vectorsSub-millisecond ANN for hot data.
MongoDB Atlas Vector SearchDocument DB + vectorsVectors alongside JSON documents.
LanceDBEmbedded / RustFile-based, Parquet-friendly, multi-modal.
FAISSLibrary (Meta)The original ANN library; powers many of the above.
Mechanics

How They Work

Embeddings — From Text to Vectors

An embedding model (OpenAI text-embedding-3, Cohere Embed, Voyage, Nomic) takes a chunk of text and returns an N-dimensional vector. Semantically similar text → vectors that are close to each other.

from openai import OpenAI
client = OpenAI()
v = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?",
).data[0].embedding   # list of 1536 floats
Indexing Algorithms
  • HNSW (Hierarchical Navigable Small World) — graph-based, very fast, the modern default.
  • IVF (Inverted File) — partition vectors into clusters, search only relevant ones.
  • PQ (Product Quantization) — compress vectors for memory savings.
  • ScaNN — Google's algorithm; powers Vertex AI Vector Search.
Filtering & Metadata

You rarely just want "similar things" — you want similar things belonging to user X, in the past 30 days, of type Y. Modern vector DBs support metadata filters that prune the search space before (or during) the ANN search.

Hybrid Search & Re-ranking
  • Hybrid: combine BM25 keyword scores with vector similarity (better recall on rare terms / IDs).
  • Re-rankers: a second-pass cross-encoder (Cohere Rerank, Voyage Rerank) that re-orders the top-N for accuracy.
  • Reciprocal Rank Fusion (RRF) blends multiple result lists.
Choosing One
ScenarioPick
Already on Postgres, < 10M vectorspgvector
Hosted, zero opsPinecone, Vertex AI Vector Search
Self-host, billions of vectorsMilvus, Qdrant
Already running ElasticsearchElasticsearch / OpenSearch
Local prototypingChroma, LanceDB
Need it on the edgeSQLite + sqlite-vec, LanceDB
Continue

Other AI Stack Layers