Substitution is the simplest way to encrypt a message. You take the original (plain text) message and swap each letter for another randomly chosen one from the alphabet. So, for example, if we swap
t->x then the word
zyhpyx and to the uninitiated this may appear unbreakable unless you already know the swapping rules (or had time to try every combination of swaps until the message makes sense). But, even in this tiny example, there is a clue to decrypting it without knowing how it was created. And that is the fact that ‘e’ is the most common letter in the English language (it appears about 10% of the time) and our cypher (encrypted) text contains two instances of the letter ‘y’ suggesting it a) might be a vowel and b) may very well be an ‘e’. The more text we have to analyse the easier this becomes.
This technique is called Frequency Analysis and is a relatively simple way to crack a substitution cypher. Samuel Morse, of Morse Code fame, used this knowledge to encode the most popular letters with the easiest Morse Codes, which is why ‘e’ in Morse is a ‘.’
This simple game illustrates how frequency analysis can be used to decode an encrypted message.
How it Works
When you press the ‘go’ button, the game will randomly selected a piece of text to encrypt using a substitution cypher. The text will be one or more opening paragraphs from a famous piece of writing in the English language (though note this idea of frequency analysis works for all languages, even sentences translated between languages first).
You will then be presented with the encrypted/cypher text and a table. The cypher text will be in upper case, which is a convention in cryptography, and decrypted/plain text will be shown in lower case.
To aid you in your quest, underneath each letter in the table you will be given the percentage frequency the encrypted letter appears in the cypher text (e.g. if the cypher text was
abcde then ‘a’ appears 20% of the time), to compare with normal/unencrypted language the table will show that percentage too. So, if ‘e’ happened to be encrypted to ‘z’ then you’d see that ‘e’ appears in normal language about 10% of the time and, in a reasonably long piece of cypher text, ‘z’ appears about the same percentage (or if not ‘z’ is quite likely to be the most frequent character). To make it easier still you will also get the percentage each letter appears in the plain text. This is more for if you get stuck because, clearly, if ‘y’ appeared in the plain text 0.13% of the time, it’s encrypted cypher partner is also going to appear 0.13% of the time.
And if it all gets too much you can choose the ‘give’ up option, which will fill the gaps in.
Choose a random piece of cypher text to decrypt:
Cypher (Encrypted) Text
Plain (Decrypted) Text