Missing Pieces in Python 3 Unicode

12 thoughts
last posted March 3, 2015, 6:21 a.m.

9 earlier thoughts

0

Something that may be useful is an explicit ability to assert a UTF-8 clean environment, and then have the interpreter operate on that basis:

  • 'surrogateescape' treated like 'strict'
  • default IO encoding set to 'utf-8'
  • standard streams all set to 'utf-8'
  • filesystem encoding set to 'utf-8' (also used for os.environ and sys.argv)

At the moment, Python's desire to be tolerant of environmental configuration errors makes it difficult to enforce consistency when you actually want it (treating any deviations as an error in the environment rather than in Python).

2 later thoughts