Python Desidirata

9 thoughts
last posted May 24, 2013, 7:59 a.m.
0
get stream as: markdown or atom
0

I'd like some things from the Python language that it does not currently have. This is an attempt to maintain a cogent list of those things.

0

Part of the inspiration for this is that Python 3 has been a huge disappointment to me. It didn't really add a lot of things I was interested in, the things it did add that I was interested in were poorly implemented, and could have been done in much less disruptive ways.

0

Another inspiration is that I need to work out some of this well enough to file some issues on the python bugtracker, and maintain a list of links to issues that I care about as language features (as opposed to bug fixes).

0

Finally, if CPython doesn't implement some of this stuff, there's still the hopes that PyPy might, and I'd like to give those developers some things to consider.

0

Alright, that's enough front matter: on to the ideas!

0

Public Vs. Published Methods

One of Python's mottos is "we are all consenting adults". In other words, there's no language-level way to prevent someone from accessing data on an object or a method on an object. Naming conventions are the extent of the protection that you get; tack an underscore on to the front of a method and it's a hint that perhaps you shouldn't call it if you want to be polite.

This is a very practical attitude about the utility of language-level "private" methods, and it has served Python relatively well.

The problem is, this motto makes two assumptions:

  1. If you're trying to invoke private functionality, you are doing it on purpose. In other words, your "consent" is informed.
  2. You - the invoker of private, unsupported functionality - are in control of the upgrades of the component that is being accessed.

The problem is that the first point is not true due to several unfortunate features of Python. For example, if I have a module "foo" that does

from lib import bar
def buz(something):
    ...
__all__ = ['buz']

and then a user of foo implements qux to use foo like this:

from foo import bar

The name bar" is purely an implementation detail of "foo". foo tried as hard as it could to say that: it set __all__ to say buz is the only thing you ought to be importing. Every single private implementation detail is exposed like this by default, unless you go out of your way to build special structures where these things are not exposed.

It would be nice to have a nicer default for imports so that wouldn't happen.

0

Share More Stuff Between Processes

In C, every copy of an executable (or shared library) shares the code in memory. It opens the shared library with mmap. Therefore, if you have a multi-process program that has a 100 megabyte executable, you use 100 megabytes of RAM for the code, regardless of how many copies of that process there are.

In Python, by contrast, .pyc files are opened and read by the read() syscall. This means that every process has its own copy of all the code in memory. In addition to all the other interpreter overhead, this imposes a non-trivial tax on spinning up additional processes to make use of additional cores.

1

Anonymous Blocks, Un-Crippled Lambda Expressions

Generator comprehensions are a not-terribly-effective workaround for the lack of anonymous blocks.

While they work nicely enough for some common use-cases, they also lead to heinous hacks when trying to satisfy reasonable use-cases.

Maybe this means that you need to be able to express suites as expressions. That's fine.

1

Homogenous, Predicate-Based, Type-Enforcing Containers

Consider this fairly typical Python class:

class Publisher(object):
    def __init__(self):
        self._subscribers = []
    def publish(self, event):
        for subscriber in self._subscribers:
            subscriber.published(event)
    def subscribe(self, subscriber):
        self._subscribers.append(subscriber)

If a user of this class has a bug where they accidentally pass something that isn't a Subscriber to subscribe, it fails slow. You don't find out that you screwed up until the next call to publish, and by that point, it's likely too late to figure out who, exactly, screwed up.

The only way to deal with this is to insert tedious isinstance checks everywhere, raise your own TypeErrors with your own error messages, and so on. The result: nobody ever bothers.

I want to be able to do, instead, something like this:

self._subscribers = list[Subscriber]()

Then I could get a nice TypeError at subscribe time instead of publish time.

It would also be great if I could do:

self._subscribers = dict[str:Subscriber]()