pg_textsearch: Native BM25 Full-Text Search in Postgres

Fletcher

25-Minute Talk

Full-text search is table stakes for modern applications, yet Postgres users often face a difficult choice: bolt on Elasticsearch and accept the operational complexity, or settle for Postgres's built-in text search with its limited ranking capabilities. pg_textsearch offers a third path—a permissively licensed Postgres extension that brings production-grade BM25 ranking directly into your database.

This talk covers the design and implementation of pg_textsearch, a new open-source extension that implements an LSM-tree-inspired architecture within Postgres's access method framework. We'll explore how the extension manages an in-memory memtable that spills to immutable disk segments, handles concurrent access through Postgres's shared memory primitives, and maintains crash recovery guarantees using only native Postgres storage.

Key topics include:

  • Why BM25 still outperforms simpler ranking functions for keyword search
  • The challenges of building a write-optimized inverted index inside Postgres
  • Fieldnorm quantization and block-based storage for memory efficiency
  • Block-Max WAND for sub-linear top-k retrieval

Whether you're considering adding search to your Postgres application or curious about the internals of building a Postgres extension, this talk provides both practical guidance and deep technical insight.

Gold Sponsors

EDB

Microsoft

AWS

Huawei

Silver Sponsors

Percona

Fujitsu

HighGo

Duboce Labs, Inc.