PGConf.dev

Fletcher

50-Minute Talk

You might think that text encoding is a problem that was solved by UTF-8. This is basically true for many developers, but PostgreSQL continues to support dozens of encodings and multi-encoding configurations. There are some rough and even dangerous edges, with implications even if you only use UTF-8. I want to present prototypes to address those with a practical model, and some other opportunities I have spotted along the way.

Overview of the PostgreSQL text encoding model, related OS concepts and motivations
The holes in that model, including shared catalogs and views, authentication, file systems and more
In which usage patterns do we get away with that? Or not?
A proposed model to nail down the encoding of everything, while allowing for reasonable usage patterns
Overview of closely related pg_wchar, holes and improvements
Opportunities to go faster
What would it take to support NUL in text?

Text encoding dÃ©bacles

Gold Sponsors

Silver Sponsors