If you could read the title just fine, it is because the English language (as well as all natural languages) are redundant. This isn't to say that there are multiple words that mean the same thing (although there are), but that if you compare the number of questions you would have to ask to uniquely identify a word I'm thinking of, and the number of possible words, the second number is much bigger than the first.
For example, by the time you read the first four letters of a word starting with
calc
, you can be pretty sure it's going to end up being
calculus
,
calcium
,
calculate
, or
calculation
. You don't need all the extra letters to distinguish the remaining possibilities. Concretely, if we take a list of all the words of a given length that exist, and sort them into alphabetical order, we only need
lo
g
2
N
questions to identify any given word in the list (where
N
is the number of words in the list). On the other hand, if we were making full use of the language, we could manage
2
6
L
unique words of length
L
.
Using the English language dictionary built in to
UNIX
operating systems, and filtering for words of length 5, I find 10230 unique words. Taking words of length 5 as a proxy for the entire English language, how short, on average, could we make five letter words before someone with perfect reasoning couldn't read them anymore?
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
Thanks to texting, I read the first word of the title as "Why".
Yeah, no. The answer has to be an integer number of letters. Sorry, Josh. Thanks for playing.
Log in to reply
@Al Fargnoli Why does an average have to be an integer?
Log in to reply
Because it is impossible to construct a message with a non-integer length of characters. Someplace you have to use a floor, ceiling, or rounding function.
Log in to reply
@Al Fargnoli – The actual words would be of integer length, but their average (which is what the exercise asked for) doesn't need to be.
Log in to reply
@Tijmen Veltman – "how short, on average, could we make five letter words"
He didn't ask for the average value. "On average" is something different, and means "generally speaking". That makes the answer '3'.
Log in to reply
@Patrick Sims – The answer 2.83389 was marked as correct, so that's a pretty strong indication that he did indeed mean the average.
Sorry, but I cannot agree with this because we are talking about English. In English, 40% of the letters are vowels. Certain digraphs tend to be highly prevalent--think "th" and "qu". The upshot is that the entropy of the language is reduced by roughly 60% on a per word basis. These facts are what make it possible to solve simple substitution ciphers. In any case, if we drop the first and last letter of the five-letter word, the ambiguity skyrockets. Actually, this principle applies to words of arbitrary length. Ask any professional cryptographer.
Use of dictionary words shortens our word list from 2 6 5 to 1 0 2 3 0 , which decreases the number of questions required to identify a word from lo g 2 2 6 5 to lo g 2 1 0 2 3 0 .
So on average, we could shorten a word proportionally to the decrease in the number of questions required to identify it, giving us a length L = 5 ∗ lo g 2 2 6 5 lo g 2 1 0 2 3 0 = 5 5 ∗ lo g 2 2 6 lo g 2 1 0 2 3 0 = lo g 2 6 1 0 2 3 0 ≈ 2 . 8 3 3 8 9
Problem Loading...
Note Loading...
Set Loading...
As mentioned in the problem, for a word of length L we have 2 6 L possibilities. In our case we have 10230 different words to make, so we need L to satisfy 2 6 L = 1 0 2 3 0 on average. This gives L = lo g 2 6 1 0 2 3 0 ≈ 2 . 8 3 4 .