Skip to content

pgvector HNSW Index Maintenance

Engramia uses pgvector's HNSW (Hierarchical Navigable Small World) index for approximate nearest-neighbor search on pattern embeddings.

Index parameters

Set during migration 001_initial:

Parameter Value Description
m 16 Max connections per node (higher = better recall, more memory)
ef_construction 64 Build-time search width (higher = better index quality, slower build)

These are good defaults for up to ~100k patterns. For larger stores, consider increasing m to 32 and ef_construction to 128.

When to rebuild

The HNSW index quality can degrade after heavy churn (many deletes + inserts). Rebuild when:

  • Recall quality drops noticeably (patterns you know exist aren't returned)
  • After bulk deletion (e.g. retention policy pruned >50% of patterns)
  • After a full reindex (engramia reindex)
  • After migrating from JSON to PostgreSQL (engramia migrate)

How to rebuild

-- Check current index size
SELECT pg_size_pretty(pg_relation_size('ix_memory_embeddings_hnsw'));

-- Rebuild the HNSW index (blocks writes during rebuild)
REINDEX INDEX CONCURRENTLY ix_memory_embeddings_hnsw;

-- If CONCURRENTLY fails, use the blocking version:
-- REINDEX INDEX ix_memory_embeddings_hnsw;

Via Docker:

docker compose exec postgres psql -U engramia -d engramia \
  -c "REINDEX INDEX CONCURRENTLY ix_memory_embeddings_hnsw;"

Monitoring index health

-- Index size vs table size
SELECT
  pg_size_pretty(pg_relation_size('memory_embeddings')) AS table_size,
  pg_size_pretty(pg_relation_size('ix_memory_embeddings_hnsw')) AS index_size;

-- Row count
SELECT count(*) FROM memory_embeddings;

Performance tuning

For search queries, pgvector uses ef_search (runtime parameter, default 40):

-- Increase for better recall at the cost of latency
SET hnsw.ef_search = 100;

-- Check current value
SHOW hnsw.ef_search;

Set globally in postgresql.conf or per-session for tuning.

Vacuum

PostgreSQL's autovacuum handles routine maintenance. After large bulk operations, run a manual vacuum:

VACUUM ANALYZE memory_embeddings;