Pushing the Limits of the Index API: Building a Columnar Store without a TAM

Labatt

50-Minute Talk

Postgres is famously row-oriented. To achieve analytical performance, most modern approaches introduce Columnar Storage via Table Access Methods (TAMs) or delegate to external formats like Apache Iceberg. But is it possible to achieve columnar speed while remaining entirely within standard Postgres heaps and the shared buffer manager?

In this talk, we explore the implementation of a high-performance columnar store built strictly as an Index Access Method (IAM) and Custom Scans. By bypassing the need for separate storage engines, we gained unique advantages but faced significant architectural constraints.

We will dive into the C-and-Rust-level implementation details of pg_search, focusing on:

  • Execution strategies: Implementing late materialization for Top-K queries and Joins to minimize tuple reconstruction costs.
  • Optimization: Utilizing dynamic filtering and sideways information passing to push filters down closer to the storage.
  • SIMD & Batching: Techniques used to accelerate columnar access within the standard block storage format.

Finally, we will provide a retrospective on the IAM and Custom Scan APIs. We will discuss specific limitations we encountered, workarounds we engineered, and propose improvements to the API to better support future analytical extensions. This session is designed for extension developers and core contributors interested in the limits of Postgres extensibility.

Gold Sponsors

EDB

Microsoft

AWS

Huawei

Silver Sponsors

Percona

Fujitsu

HighGo

Duboce Labs, Inc.