Missing Pieces in Python 3 Unicode

by ncoghlan_dev

12 thoughts
last posted March 3, 2015, 6:21 a.m.

When we did the Python 3 migration, we knew we were swinging a wrecking ball through all the current strategies people used to cope with the messy reality of the blurry boundary between binary and text data.

Python 2 assumes you live on that boundary all the time, and will gleefully corrupt data by allowing implicit combination of data from different sources with different encodings.

The core Python 3 model is different: it assumes the shiny happy world where text is text, and binary data is binary data, we use encoding and decoding to get between them, and encoding declarations are never wrong, and data is never corrupted.

created July 3, 2013, 1:07 a.m.

11 later thoughts

Missing Pieces in Python 3 Unicode

by ncoghlan_dev

Keyboard Help