I have often wondered about what I'll call for lack of a better term 'phonotactic coverage'.
That is, for all the possible lemmas according to a language's phonotactics, how many of them actually exist in the language?
I suppose you could also call it phonotactic density.
My intuition has been that Yiddish is particularly dense along its phonotactics. Now studying Italian, I wonder if it is too.
I'll admit, my primary motivation for posting this was a hope that jtauber would have something to say about it.
I had a similar thought for a start, but not for English. Ideally you'd find a word list with the most phonemic spelling you could find. That's one of the reasons that I thought of Yiddish, whose Romanization (YIVO) is very regular and reflective of its phonology. You'd have to come up with some mildly sophisticated rules for a lexer that crawls the words and builds syllables—mostly for building diphthongs and consonant clusters—but nothing too hairy.