How Vector Databases Search: A Practical Guide to IVF, HNSW, PQ & ScaNN

Surya

21 Nov 2025 — 4 min read

Introduction

Modern applications-from image similarity to RAG systems-run on fast vector search. When you embed text, audio, or images into high‑dimensional vectors, finding the closest matches becomes a problem of approximate nearest neighbor (ANN) search. This is where vector databases like FAISS, Milvus, Pinecone, Weaviate, and Elasticsearch step in.

But behind the scenes, these systems use very different indexing strategies. Some are clustering‑based (IVF), some graph‑based (HNSW), some compress aggressively (PQ), and some are hyper‑optimized via learned heuristics (ScaNN).

This post breaks down how these indexes work, why each exists, and how to choose the right one-without drowning in academic theory.

We'll use intuitive examples, simple diagrams, and real‑world constraints from Hoomanely’s own RAG + retrieval systems.

Why Vector Indexes Matter

A naive vector search scans every embedding-O(N) complexity. For a million vectors, that’s trivial; for a hundred million, your latency explodes.

ANN indexes accelerate search by:

Cutting the search space (IVF)
Navigating only promising regions (HNSW)
Compressing vectors to fit in RAM (PQ)
Reducing compute via learned pruning (ScaNN)

In systems like Hoomanely, where we handle thousands of pages per query across hundreds of veterinary PDFs, low-latency retrieval is core to a smooth user experience.

1. IVF - Inverted File Index

Concept

IVF partitions the vector space using clustering (usually k-means). Instead of searching all vectors, you search only a few clusters.

Think of it like a library:

First, shelves (clusters) are numbered.
When you search, you only walk to a handful of shelves.

How It Works

Run k-means to get nlist centroids.
Assign every vector to the nearest centroid.
At query time:
- Assign query to nearest centroids.
- Search vectors only in those partitions.

Mini Example

You create 1,024 clusters.
For each search, you probe 8.
That’s a 128× reduction in search cost.

Pros

Very memory‑efficient.
Easy to scale.
Works great with PQ (IVF‑PQ).

Cons

Quality depends heavily on clustering.
Not ideal for unstructured distributions.

2. HNSW - Hierarchical Graph Index

Concept

HNSW builds a multi‑level navigable small‑world graph. Each node links to its nearest neighbors. Higher levels provide long‑range jumps; lower levels provide fine precision.

Imagine Google Maps but with flyover highways (top layers) + local roads (bottom layers).

How It Works

Build multiple layers; each upper layer has fewer nodes.
Insert each vector with random layer height.
During search:
- Start at top layer → move greedily toward the query.
- Drop layers until you reach the ground layer.
- Explore neighbors and return top‑K.

Pros

Highest recall at low latency.
Great for real‑time updates (insert/delete).
No expensive clustering.

Cons

Higher memory footprint.
Slow to build for large datasets.

Used by: Pinecone, Milvus, Weaviate, FAISS-HNSW.

3. PQ - Product Quantization

Concept

PQ compresses each vector into small discrete codes using quantization. You trade accuracy for significant memory savings.

Think of splitting a 768‑dim vector into 16 chunks of 48 dims each, then encoding each chunk using a small lookup table.

How It Works

Split each vector into m sub‑vectors.
Cluster each subspace separately.
Store only the cluster IDs.
During search:
- Precompute distances to all codebooks.
- Estimate distances very fast.

Pros

Compresses vectors by 8×–32×.
Allows massive datasets to fit in RAM.
Works well with IVF (IVF‑PQ).

Cons

Lower recall compared to HNSW.
Codebook training matters a lot.

Used by: FAISS‑PQ, Milvus IVF‑PQ.

4. ScaNN - Google’s High‑Performance ANN

Concept

ScaNN (pronounced “scan”) mixes tree partitioning, anisotropic quantization, and learned pruning. Its strength lies in balancing speed and accuracy without excessive memory use.

How It Works

ScaNN typically uses:

Partitioning: tree‑like IVF to narrow search.
Score quantization: faster distance computations.
Residual reordering: rescoring top candidates exactly.

This makes it lightning‑fast on TPUs/CPUs.

Pros

High recall with moderate memory.
Great for text embeddings.
Strong performance in batch mode.

Cons

Harder to tune.
Not as common in production as HNSW.

Used in: Google Search, Vertex Matching Engine.

Comparing All Four

Index	Best For	Memory	Build Time	Recall	Typical Use
IVF	Large static datasets	Medium	Medium	Medium	RAG systems, batched search
HNSW	Real‑time, high‑recall search	High	Slow	High	Recommendations, semantic search
PQ	Huge datasets with tight RAM	Very Low	Medium	Low‑Medium	Image search, dedup, RAG archives
ScaNN	High‑speed text search	Medium	Fast	High	Web‑scale retrieval

Image Placeholder: "Spider chart comparing HNSW, IVF, PQ, and ScaNN across Recall, Speed, Memory, Ease."

How We Apply These at Hoomanely

At Hoomanely, our mission is to empower pet parents with real‑time, AI‑powered insights. A core part of this is our veterinary RAG pipeline, which retrieves the right pages from thousands of medical documents.

We’ve tested IVF, HNSW, and PQ combinations extensively. In internal benchmarks:

HNSW provides the strongest recall, especially for long clinical queries.
IVF‑PQ gives the best memory footprint, ideal for large document sets.
Hybrid setups (IVF‑HNSW) give fast retrieval on CPU‑only environments.

Choosing the right index helps us deliver faster answers and more accurate guidance for pet wellbeing.

Takeaways

IVF: great first choice for scalable, static datasets.
HNSW: best all‑rounder for high recall + dynamic inserts.
PQ: use when memory is tight or datasets are huge.
ScaNN: ideal for high‑speed text retrieval, especially in cloud environments.

The “best” index depends on your constraints: memory, latency, recall needs, and whether vectors change frequently.

How Vector Databases Search: A Practical Guide to IVF, HNSW, PQ & ScaNN

Surya

Introduction

Why Vector Indexes Matter

1. IVF - Inverted File Index

Concept

How It Works

Mini Example

Pros

Cons

2. HNSW - Hierarchical Graph Index

Concept

How It Works

Pros

Cons

3. PQ - Product Quantization

Concept

How It Works

Pros

Cons

4. ScaNN - Google’s High‑Performance ANN

Concept

How It Works

Pros

Cons

Comparing All Four

How We Apply These at Hoomanely

Takeaways

Read more

Designing Hardware That Survives Accidental Hot-Plugging

Server-Driven UI: Changing Layouts Without App Updates

When Optimization Breaks: The Debug vs Release Performance Paradox

Graceful Degradation: What We Turn Off First