It's 1944, and you have just been handed the following cipher text, intercepted from a German command post in France:
There is reason to believe that this text was not encrypted with the sophisticated German Enigma machine but instead used a Vigenère cipher with a simple pass phrase. Thus, it is vulnerable to frequency analysis.
Decode this message and find out where the Germans are sending their bombers so we can evacuate the civilians. You will enter the number of the district (as shown in this list ) to submit your answer.
Details and assumptions
A Vigenère cipher is a more complicated shift cipher. For example, rather than shifting every character in a word by two letters like this:
SECRET -> UGETGV
you would instead choose a pass phrase (e.g., "CAT"), which would then tell you to shift each of the letters in your plain text by the letter in the passphrase ("C" is a shift of 2 character, "A" is a shift of none, etc. ) :
C A T C A T (passphrase)
S E C R E T (plain text)
U E V T E M (cipher text)
Note: This problem is loosely based on cryptographic methods used around the time of the invasion of Normandy in 1944.
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
I used Kasiski examination to find the key length & then used frequency analysis on the 5 columns. Got HIEPO, tried using T for some and worked out HIPPO.
Wow... Nice solution and very-very clear.
But I am using this this is calculator with length keyword is 5 here , Actually the key is HIEPO(but not good result), I change become keyword HIPPO.
Log in to reply
Interesting site. I tried using it, but it told me that the key length seems to be length 12 based on their analysis, which of course leads to a meaningless keyword. Were you trying different possible lengths and happened to chance upon HIEPO?
With computer science, we should be trying to understand how to write the code, and why the code works, as opposed to just leeching off someone else. This is similar to doing the math problems using Wolfram to solve, when you don't know how to approach it.
Log in to reply
You can use this site to get the length length keywords , this site very clear say that the length of keyword is 5. I open much site to getthe keyword.
I think that's right. In my reorder function, where I restrict the index to be between 1 and 26 I unintentionally introduced an offset of 1. In mine the convention is then 'A means a shift of one' when the standard convention is 'A means a shift of zero'.
Log in to reply
In the way you used, Z would be a shift of 2 6 , but that would still be 0 because the alphabet would cycle around.
http://smurfoncrack.com/pygenere/pygenere.php might be a better cracker.
Log in to reply
Wow... without Analyze... Fantastic Searching...
what a nice solution ! May I ask which programming language did you use to crack this code ?
Log in to reply
I used the programming language inside Mathematica though not any Mathematica libraries.
Look for letter patterns and the spacing of the patterns; This will aid in determining key length. Below is a excerpt of that analysis, which clearly shows that the key length is likely 5 letters in length.
Next, do five separate frequency analysis of every fifth letter (e.g. every fifth letter starting from first letter. Every fifth letter starting from the second letter. etc.) Below is an excerpt of that analysis, which compares the analysis to a standard English language frequency table to determine the shift.
You then can figure out the key word "HIPPO", and decipher the text. The last line of the message reads "...SEND HEAVY BOMB REPRISALS TO MANCHESTER...", and Manchester is the 6th District.
I really like Josh Silverman's solution below with Mathematica!
I used Python. First, I assumed that parts of the coded message might repeat (that is, Kasiski examination.) So I searched for large substrings within the ciphertext that repeat (substring length > 3).
It turns out that HICWUPN repeats at positions 237, 287, 602. AIDD repeats at positions 206, 326, 806, 916, 1066. While GAWE repeats at positions 394, 529, 889.
By inspection, it can be seen that the distance between repeated substrings in each case (287 - 237 = 50; 602 - 287 = 315 etc.) has a gcd of 5. So the key length is probably 5.
Step Two was letter-frequency analysis. Or more precisely, 5 frequency analyses – one for each letter of the key.
To find the first letter of the key, I extracted every fifth letter from the ciphertext, starting at position 0, and performed a frequency analysis on that extracted substring. It turns out the most common letters of that extracted substring are (in descending order) LAHZVUPTKY.
Now all the letters in LAHZVUPTKY are encrypted using the first letter of the key, so if we decode LAHZVUPTKY with the correct first letter of the key, we should get a string containing the most common letters in natural English, with the most common letter first.
There are only 26 possibilities to consider, so this can be done by inspection. We try decoding LAHZVUPTKY using all letters A-Z, and choose the best.
Most results look like a random jumble of letters. If you decode LAHZVUPTKY with the letter Q, for example, we get the result VKRJFEZDUI, which in no way reflects the most common letters in English.
But decoding LAHZVUPTKY with the letter H gives a plausible order of frequent letters: ETASONIMDR. This looks about right; E is the most common letter in English, and TASON are each fairly common. So H is probably the first letter of the key.
We continue with extracted substrings containing every fifth letter, starting at Positions 1,2,3,4. And we discover that the most plausible letters for these letters of the key are I,P,P,O, giving the following letter frequency analyses:
Position 0: LAHZVUPTKY; decrypted with H: ETASONIMDR
Position 1: MQAWIVBZK; decrypted with I: EISOANTRC
Position 2: ITCXHGD; decrypted with P: TENISRO
Position 3: TIPXDCHB; decrypted with P: ETAIONSM
Position 4: SBHCGODW; decrypted with O: ENTOSAPI
And the key is HIPPO.
The way I used to solve is, you need make a frequency table using a programming language of your choice, make a program that allows you to input text and counts how many times (throw them into an array), if you use C++ just use the algorithm standard library, which I prefer not to tell since that's point of the question, right? Compare them to the english letter frequency table, see similarities in the frequency, you can get well, interesting results, although you can improve the program to run a kasiski test (hint hint wink wink, history hint) I used this method. Took me a couple of hours to solve, I also referenced the khan academy video on the caesar cipher, although different its similar what not, I only used coding for frequency analysis. You can learn that the cipher is vulnerable because it can be analyzed, which the worst thing you want as a cipher since it pretty much gives an answer. Another trick is to simply guess out letters in case you don't see a pattern, its close to brute forcing but still reasonable although it only works if you know , say I have the encrypted message I can look at it as three stacks, the plain text on top, key in the middle and encrypted text on bottom. You can throw in common words or simply all the answers converting them through the chart, but its still painstaking without knowing where the answer is so it is unreasonable to use this method, it only works against shorter text. Sorry its a bit lengthy but I hope its some helpful insight on "how" you can get the answer, not "what" the answer is, for the sake of learning.
Here's all the text:
MAY EIGHT ENIGMA MACHINE BROKEN FALLING BACK ON SECONDARY CODE SYSTEM STOP
MAY NINE DIVISION THREE MOVING SOUTH TOWARD LE HAVRE STOP
MAY TEN ENEMY NAVAL ACTIVITY DETECTED OFF COAST OF RAMSGATE STOP
MAY TEN SUSPECT ENEMY MAY BE MASSING FOR INVASION STOP
MAY ELEVEN WHAT IS YOUR POSITION STOP
MAY TWELVE EXTENSIVE ENEMY NAVAL ACTIVITY NEAR CALAIS STOP
MAY THIRTEEN PLEASE WHAT IS YOUR POSITION STOP
MAY FOURTEEN INCOMING STORM MAY INTERRUPT COMMUNICATIONS FOR SEVERAL DAYS STOP
MAY EIGHTEEEN MINOR SKIRMISH IN CHANNEL WITH LOSSES ON BOTH SIDES STOP
MAY TWENTY DIVISION THREE ARRIVED IN LE HAVRE STOP
MAY TWENTY ONE WHAT IS YOUR POSITION STOP
MAY TWENTY TWO DIVISION THREE ASSISTING WITH PREPARATIONS FOR BEACH HEAD DEFENSE STOP
MAY TWENTY NINE LACKING AMMUNITION FOR MACHINE GUN BUNKERS PLEASE SEND WITH RESUPPLY STOP
MAY THIRTY MORE ENEMY ACTIVITY NEAR CALAIS MOVING DIVISION THREE TO CALAIS STOP
JUNE ONE LE HAVRE AND OTHER BEACHES SECURE MASSING DEFENSE AT CALAIS STOP
JUNE TWO REPORTS OF EXTENSIVE TROOP MOVEMENTS AND NAVAL ACTIVITY OFF ENTIRE COAST OF BRITTAIN STOP
SUSPECT INVASION IMMINENT STOP
SENT ALL POSSIBLE REINFORCEMENTS TO CALAIS STOP
JUNE THREE ALL POSSIBLE PREPARATIONS COMPLETED STOP
JUNE FOUR WORRIED ABOUT POSSIBLE ENEMY INTERCEPTION OF COMMUNCATIONS STOP
JUNE SIX NORMANDY INVADED SEND HEAVY BOMB REPRISALS TO MANCHESTER ON JUNE EIGHT STOP
Problem Loading...
Note Loading...
Set Loading...
I believe there are far more elegant ways to decode the Vigenère cipher, but here is my approach which is more like a bad episode of Law and Order than a beautiful algorithmic solution. I used Mathematica for my work.
First, I wrote a function to sample the characters of the cipher text (cipherText) according to different periods, n
I use this function to take periodic samples of the text to evaluate as potential shifts.
As indicated in the problem, an easy thing to do is a frequency analysis. Natural languages have a structure to them that reflects common usage patterns. For instance, in English, the letters e, t, and a are boosted in comparison to the rest of the alphabet, and the letters x, z, q, and j are very rare compared to the rest of the alphabet
If we pick the wrong key length for the cipher text and analyze letter frequency, we can expect that the frequencies for the rarest letters will be boosted compared their frequencies for a correct key length (those letters will have low frequencies from their occasional appearance in the correct sampling as it comes in and out of frame with the erroneous key length we pick). Similarly, the letters that represent e, t, and a will be artificially lowered as they mix with out of frame sampling.
This should tend to make the letter distribution more linear and destroy the rich structure found in the natural usage.
A shift of n results in n sub-cipher texts, each with a characteristic letter distribution. In order to find the right key length for the encrypted text, I computed the average letter distribution vector (after sorting each vector by value) for each key length, took the mean of the n vectors (to establish an average letter frequency vector for the given key length) and plot them all with the frequency vector for the English language.
It appears that the letter frequency vector for a key length of five (green) maintains the structure of the natural language (blood orange), whereas the other key lengths are linearized by the sampling.
To confirm the finding above, I took the mean letter frequency vectors for key lengths from 2 to 30 and took their inner product with the sorted letter frequency vector for the English language:
The inner product has peaks at key lengths which are multiples of 5.
This suggests that the key length is indeed 5.
Now, we look at the frequency tables of the sub-cipher texts for the key length 5:
Let's analyze the first sub-cipher text to figure out the shift that was applied.
If we order the letter frequencies in alphabetical order and plot them (orange) against the letter frequencies for the English language (black), we see the following:
Obviously, they are out of frame. By inspection, it appears that if we shift the sub-cipher (orange) to the left, we could make them overlap
Indeed, they overlap as well as might be expected.
This shows that the shift used for the first sub-cipher was 7 and therefore the key begins with a G.
Repeating this analysis for the other sub-ciphers, we find the key that was used is
With the key in hand, we can translate the cipher text with the following Mathematica code for Alphabet shifting
which gives a list of the unencrypted sub-plain texts.
We now join them
and present the unencrypted plain text:
Final transmission: JUNESIXNORMANDYINVADEDSENDHEAVYBOMBREPRISALSTOMANCHESTERONJUNEEIGHT
As the last message says, the command is to bomb Manchester, England in response to the invasion at Normandy.