April 2024, Revisited: RAG Grows Up on Postgres

Eric Greene June 11, 2026

Our Three-Year Retrospective reaches April 2024, a month that crystallized two trends at once: retrieval-augmented generation settling in as the default enterprise LLM architecture, and PostgreSQL — via the pgvector extension — emerging as the place a great many teams decided to run it. Neither trend announced itself with a keynote. Both reshaped how production AI systems got built.

Why RAG won the enterprise

By spring 2024, organizations had spent a year learning the same lesson independently: a general-purpose model does not know your data, and the two ways to fix that were fine-tuning or retrieval. Retrieval kept winning. RAG — embed your documents, store the vectors, retrieve the relevant chunks at query time, and hand them to the model as context — meant knowledge could be updated by updating an index instead of retraining anything; answers could cite their sources; and access control could be enforced at retrieval time, so the model only ever saw what the requesting user was allowed to see. Fine-tuning kept a role for style and task-shaping, but for knowledge, retrieval was cheaper, fresher, and auditable. The expanding context windows of the era's models did not kill the pattern, as some predicted — stuffing a million tokens of mostly-irrelevant text into every request was slower, costlier, and often less accurate than retrieving the right five passages.

The architecture pulled a new pipeline into existence alongside it: chunking strategies (the unglamorous decision that dominated answer quality), embedding model selection and the realization that swapping models means re-embedding everything, and evaluation harnesses to answer "did retrieval actually find the right thing?" separately from "did the model use it well?" Teams discovering that RAG quality is mostly retrieval quality was the recurring engagement of that season.

pgvector and the case for boring infrastructure

The vector-database land grab of 2023 had assumed vector search needed new infrastructure. pgvector made the counterargument: it is an index type, not a product category. The extension's 0.5.0 release in late 2023 had added HNSW indexes, replacing flat or IVF scans with the graph-based approximate-nearest-neighbor structure the dedicated engines used — and benchmark gaps narrowed dramatically. The 0.6.0 release in early 2024 made HNSW builds parallel, turning index construction from an overnight job into a coffee break, and the 0.7.x line was about to widen type support further. By April 2024, for the corpus sizes most enterprises actually had — millions of vectors, not billions — pgvector was simply fast enough.

What it offered that dedicated engines could not: your vectors live in the same database as your data, in the same transactions. One query could combine vector similarity with SQL predicates — WHERE tenant_id = $1 AND published — joined against your real tables, with hybrid search via tsvector full-text in the same statement, under the same backups, replication, and access controls your team already operated. No sync pipeline between your system of record and your search index, and no new database to page someone about at 3 a.m. Our advice that spring was blunt: start on pgvector; let measured scale, not architecture-diagram aesthetics, force you off it. Most teams never got forced off.

Looking back from June 2026

Both bets held. RAG matured from pattern into plumbing — refined with hybrid retrieval, rerankers, and better evaluation, and absorbed into the agent era, where retrieval became a tool agents call rather than a fixed pre-processing step — but the fundamental architecture is recognizably the one that hardened in early 2024. And Postgres-as-vector-store went from contrarian to conventional: pgvector kept compounding (quantization, filtering improvements, steady performance work), and "just use Postgres" proved as durable for embeddings as it had for queues, JSON, and full-text before them. The dedicated vector databases found their niches at genuinely large scale; the median RAG system runs on the database the team already had.

We teach this stack end-to-end in Production RAG Systems — chunking, embedding pipelines, hybrid retrieval, evaluation, and the operational concerns demos skip — and the Postgres foundation underneath it in PostgreSQL for Python Programmers, which covers working with pgvector alongside everything else that makes Postgres the default answer it became during the month this post remembers.