Can We Skip Recovery? The Architecture of On-Demand WAL Replay.

Labatt

25-Minute Talk

Postgres availability is currently bound by the volume of WAL generated since the last checkpoint; the server simply cannot accept connections until every single record is replayed.

What if we could decouple startup time from WAL volume? This talk explores an experimental architecture for on-demand WAL Replay, where the server opens for connections immediately after a crash, and individual pages are recovered lazily only when requested by a client. This shifts recovery from a global blocking operation to a granular, page-level cost.

In this talk We will explore the problems encountered during the implementation of on-demand recovery, the strategies used to solve them, and the unresolved design challenges I am trying to solve.

Problems Faced:

  1. couldn't skip replaying in-total.
  2. Initially It was very slow to read all the related WALs of a specific page every time on on-demand replay.
  3. what happens when checkpointer timeout or manual checkpoint runs during on-demand replay?
  4. what if no one requests for a page for a long time?
  5. what if on-demand replay happens recursively?? because Each WAL touches more than 1 page.
  6. pg_basebackup was able to see inconsistent data.

Open Challenges:

  1. on-demand replay relies on a shared hashtable ,but it's required to know the size (number of pages to recover) upfront which is not possible in crash recovery.
  2. Recent changes in PG 18 make it unsafe to perform replay at the point where pages are read during recovery, as this now happens inside a critical section.

Gold Sponsors

EDB

Microsoft

AWS

Huawei

Silver Sponsors

Percona

Fujitsu

HighGo

Duboce Labs, Inc.