Labatt
25-Minute Talk
Postgres availability is currently bound by the volume of WAL generated since the last checkpoint; the server simply cannot accept connections until every single record is replayed.
What if we could decouple startup time from WAL volume? This talk explores an experimental architecture for on-demand WAL Replay, where the server opens for connections immediately after a crash, and individual pages are recovered lazily only when requested by a client. This shifts recovery from a global blocking operation to a granular, page-level cost.
In this talk We will explore the problems encountered during the implementation of on-demand recovery, the strategies used to solve them, and the unresolved design challenges I am trying to solve.
Problems Faced:
- couldn't skip replaying in-total.
- Initially It was very slow to read all the related WALs of a specific page every time on on-demand replay.
- what happens when checkpointer timeout or manual checkpoint runs during on-demand replay?
- what if no one requests for a page for a long time?
- what if on-demand replay happens recursively?? because Each WAL touches more than 1 page.
- pg_basebackup was able to see inconsistent data.
Open Challenges:
- on-demand replay relies on a shared hashtable ,but it's required to know the size (number of pages to recover) upfront which is not possible in crash recovery.
- Recent changes in PG 18 make it unsafe to perform replay at the point where pages are read during recovery, as this now happens inside a critical section.

