Missing Pieces in Python 3 Unicode

by ncoghlan_dev

12 thoughts
last posted March 3, 2015, 6:21 a.m.

However, for the status quo, there's still a few pieces missing. For a "sorta decoded" surrogate escaped string, the dance to turn it back into a properly decoded string with no surrogates is like this:

sorta_decoded_str.encode(assumed_encoding, errors="surrogateescape").decode(correct_encoding)

The case where the assumed encoding is latin-1 is just a special case of this one, since the surrogate escape error handler will never fire in that situation (since latin-1 is a direct mapping of bytes values to the first 256 Unicode code points)

created July 3, 2013, 1:38 a.m.

7 later thoughts

Missing Pieces in Python 3 Unicode

by ncoghlan_dev

Keyboard Help