Fletcher
50-Minute Talk
You might think that text encoding is a problem that was solved by UTF-8. This is basically true for many developers, but PostgreSQL continues to support dozens of encodings and multi-encoding configurations. There are some rough and even dangerous edges, with implications even if you only use UTF-8. I want to present prototypes to address those with a practical model, and some other opportunities I have spotted along the way.
- Overview of the PostgreSQL text encoding model, related OS concepts and motivations
- The holes in that model, including shared catalogs and views, authentication, file systems and more
- In which usage patterns do we get away with that? Or not?
- A proposed model to nail down the encoding of everything, while allowing for reasonable usage patterns
- Overview of closely related pg_wchar, holes and improvements
- Opportunities to go faster
- What would it take to support NUL in text?

