However, for the status quo, there's still a few pieces missing. For a "sorta decoded" surrogate escaped string, the dance to turn it back into a properly decoded string with no surrogates is like this:
sorta_decoded_str.encode(assumed_encoding, errors="surrogateescape").decode(correct_encoding)
The case where the assumed encoding is latin-1
is just a special case of this one, since the surrogate escape error handler will never fire in that situation (since latin-1
is a direct mapping of bytes values to the first 256 Unicode code points)