What is Cryptanalysis?
Cryptanalysis is the term for attempting to read an encrypted message without knowing the secret key, if the cipher used to encrypt is a fairly strong one then cryptanalysis is inherently very difficult.
The first great leap in cryptanalysis was the profound though seemingly self-evident observation that languages use a number of letters and use them in different proportions. In English, for example, the letter 'E' is the most commonly used, followed by 'T', 'A', 'O' and 'N'. The letters 'Q', 'X', 'V' and 'Z' are relatively uncommon.
If a message is encrypted using a monoalphabetic cipher then these differences can be exploited to provide reasonable guesses as to the substitutions making up the cipher. For instance, if the ciphertext has the letter 'P' as its most common then it is reasonable to assume that the ciphertext letter 'P' equates to the plaintext letter 'E'.
Following this process it is often possible to break a monoalphabetic cipher extremely quickly.
Another feature of languages is the way in which letters are ordered relative to each other. In English, the letter 'T' is followed by 'H' more than by any other letter, the letter 'H' is most commonly followed by 'E'.
In conjunction with frequency analysis this information would make the detailed selection of substituted pairs simpler, in some cases much simpler. If the message contained the letters 'SF' in a few places, 'F' was fairly common throughout the message but 'S' was only ever seen in front of 'F' then it might be fairly easy to state that 'F' equated to the plaintext 'U' and 'S' to the plaintext 'Q'.1
The invention of the polyalphabetic Vigenère cipher had a dramatic impact on cryptology. Because it defied frequency analysis and therefore contact analysis as well it destroyed the two tools that were known to break ciphers. It was heralded as a truly unbreakable cipher, one that would never be cryptanalysed.
The man who first identified the way to break the Vigenère cipher was Charles Babbage, more famous as the inventor of the stored-program computer. As is often the case with ciphers, an amateur had independently discovered the Vigenère method and published a letter in a newspaper challenging anyone to break it. Babbage immediately responded that he could break it easily, but at that point he was encouraged not to reveal his method to the world at large. Vigenère was used at that time as the standard cipher for the world's governments and armies.
A Polish mathematician, Friedrich Kasiski, subsequently published what has become known as the Kasiski method, and it is now clear that this method was the same that Babbage had independently identified.
The method works on a sound principle. While it is impossible to use frequency analysis on a Vigenère ciphertext, it would be possible to use frequency analysis on each letter in the ciphertext that happened to be encrypted using the exact same letter of the key. In other words, if it was possible to identify the length of the repeating key and that length was, for instance, seven characters, then the first letter of the ciphertext would have been encrypted with the same key letter as the eighth letter.
If that is known then the first, eighth, fifteenth, twenty-second etc. letters of the ciphertext could be isolated and a frequency analysis performed.
This still leaves the problem of how to identify the length of the key. The answer proved to be contact analysis once again. Because the letter pair 'TH' is so common in English it is likely to have been encrypted several times in a message by the same pair of key letters, leaving the same pair of ciphertext letters. Simply look for repeating pairs of letters and measure the distance between them. If there are repeating triple letter groups or ones even longer then that provides even more certainty. Obviously this won't always provide you with the exact answer you are looking for; if you found for example:
Then it might take a few moments of thought before deciding that all the gaps were divisible by seven except the single pair at nine spaces and neither nine nor three are anywhere near as clearly identified as seven. In this case the nine spaces entry is purely bad luck, just a fluke.
- Three pairs of pairs at seven places difference
- One pair of pairs at nine places difference
- One pair of pairs at twenty-one places difference
- Two pairs of pairs at thirty-five places difference
Another technique, and a powerful one, is available to the cryptanalyst if there are several messages to work with that were encrypted with the same key. In fact this technique works even on one-time pad ciphers, which is why OTPs are only supposed to be used once. Kerckhoffs, one of the great cryptologists, discovered it and published it in a military communications magazine.
The principle is simple, once again it works by finding something that can be cryptanalysed using frequency analysis, this time it is the assumption that two messages encrypted with the same key will have had both their first letters encrypted the same way, the same with both their second letters and so on. If you have sufficient numbers of messages that can be used in this way you again have a large number of simple monoalphabetic substitutions to resolve. The number of messages to be placed over each other is known as the depth and the technique is usually called superimposition.
Kerckhoffs had another important contribution to the field as well, he noted six attributes that a good military field cipher should posess, the list is still considered fairly reasonable:
- The system should be, if not theoretically unbreakable, unbreakable in practice.
- Compromise of the system should not inconvenience the correspondents.
- The key should be rememberable without notes and should be easily changeable.
- The cryptograms should be transmissable by telegraph.
- The apparatus or documents should be portable and operable by a single person.
- The system should be easy requiring neither knowledge of a long list of rules nor involving mental strain.
One of his six later became known as 'Kerckhoffs Principle' because of its importance. The Principle states that: In a cryptographic system the security of the system should not be dependent upon security of the method of encryption used (an expanded version of the second attribute). These days this is usually phrased as: The security should reside entirely within the key. This was a very important distinction and it isn't obvious why it was first made but it is one of the basic design requirements of all modern ciphers.
Sometimes it is helpful to guess at the plaintext to try and resolve the key. This process is called cribbing. A crib can be something that is expected in a message or a simple statistical test, cribs are extremely useful tools.
Once in a while a cryptanalyst will decide to use 'seeding', the process of forcing information into cribs by controlling events. There are many good examples but one will suffice. One of the roles of German U-Boats during the Second World War was to gather meteorological information while at sea. They would also report information about things they spotted while making their observations. Sometimes the British could arrange for an 'event' to be witnessed by such a U-Boat, perhaps an aircraft laying naval mines. Knowing that this information would be included in the weather report the codebreakers had some cribs to work with, such as the words 'aircraft and 'mine'.
With the advent of serious statistics into cryptanalysis there was another neat use for cribbing. If you know that the system works a certain way then you can predict the key if you guess the plaintext and have the ciphertext. If you have sufficient depth then you can begin an analysis of the encrypted messages as follows:
- Assume that the first letter in the first message is 'A'.
- Calculate the key letter that would have been needed to encrypt 'A' into the ciphertext letter that appears.
- Apply that key to all the other first letters from the other messages.
- Repeat the process assuming that the first letter in the first message is 'B' and so on.
- Check the frequency distributions and relative proportions of the letters, in many cases one will stand out as most probable.
- Look for repeating sequences to try and determine key length and cipher type.
The principle is pretty simple, if the cipher is Vigenère and two frequent characters appear at a given location in the depth and are one character apart then they are fairly likely to represent 'S' and 'T', very unlikely to represent 'W' and 'X'.
Another example of cribbing was used with Enigma. As mentioned above the message may have been seeded with a known word but locating that word in the message is still a problem. Enigma has a funny feature, though. No letter can ever be encrypted as itself. This means that the cryptanalyst can write out the crib and move it along the ciphertext. If any letter matches then the crib is in the wrong place. A good shortcut.
Enter the Mathematicians
Over the course of the twentieth century mathematics slowly began to dominate the field of cryptanalysis. The originator of the movement could be said to be the American William Friedman, he was able to redefine all the little tricks and heuristic rules used by cryptanalysts and connect them with the mathematical field of statistics.
Immediately the science leapt forwards. Suddenly all the established statistical procedures could be used to attack ciphers and the world was not slow to take notice.
With the advent of digital cryptography there has been a similar, more mathematical, approach made to the problems of breaking such ciphers.
Remembering that Kerckhoffs had defined the ideal situation in which all the security of an algorithm resides in its key it is possible to extend cryptanalysis under this assumption. Specifically, if the key provides all the security of an encryption system then there is no need for the users of the system to protect any other components. Imagine if the Germans had handed some Enigma machines, all the operating instructions and all the blueprints for the machines to the British at the start of the Second World War and this act of kindness in no way compromised the security of German communications.
The Kerckhoffs assumption has become so well ingrained in modern cryptology that these days the best experts in the field always recommend using ciphers once their algorithms have been published and thoroughly examined. Though it sounds strange, keeping the algorithm secret can actually reduce the security of the system. What this means in practice is that most modern encryption systems are fully understood by potential attackers and this leads the way to modern types of attack.
All the attacks so far discussed are categorised as ciphertext-only. This means that the potential attacker knows the transmitted ciphertext and the algorithm but does not know the plaintext or the key. It may seem rather strange to care about breaking a cipher if the key or the plaintext is already known but remember that all the security resides in the key and the key could be changed at any time. By analysing the algorithm it may be possible to find a 'shortcut' method permitting future keys to be broken substantially more quickly than the time required to try each potential key in sequence until the correct one is found, this being the brute-force attack.
As has already been seen, recent encryption algorithms often use massive keyspace, large blocklength and multiple rounds of operation to increase complexity. These make brute-force attacks time-consuming. Having said that, even a randomly selected monoalphabetic substitution cipher with a single character blocklength has a huge keyspace, it is the weaknesses of the algorithm that are exploited for sensible attack.
It can be easier to understand the operation of a cipher if the attacker has access to both the ciphertext and the plaintext. This type of attack has been exploited since before the advent of digital cryptology, it is a common method for gaining the first results from a new cipher which is replacing one already broken.
It would not be uncommon for a complex military radio network to transmit a message from a field commander to headquarters in one cipher and for headquarters to send it on to the government in a different system. If either system is already broken then the result is to hand the attacker both the transmitted ciphertext and the underlying plaintext from the other system.
Cribbing is a specialised form of known-plaintext attack. It might be called an assumed-plaintext attack. The details of cribbing are discussed above.
This could be seen as a cousin of the known-plaintext attack and can be even more powerful. In this attack the ciphertext and the plaintext are both known but the plaintext is also selected specifically to have certain mathematical characteristics. Identifying these characteristics in the ciphertext can lead to a break in the cipher.
This is an even more powerful relative of the chosen-plaintext attack. A series of plaintexts are chosen based not only on the basic mathematical assumptions but also on the results of the previous attempts. A cryptanalyst gets the opportunity to try a succession of different types of plaintext to see which will yield the most promising results.
Described in 1990 by Biham and Shamir this technique is a variant of the chosen-plaintext attack. Two plaintexts are chosen with specific differences and each is encrypted with the same key. The two ciphertexts are subjected to a mathematical analysis which can reveal a group of probabilities associated with the value of each bit in the key. If repeated sufficiently often it can become statistically clear whether a key-bit is likely to be set or cleared.
When cryptologists attempted to use this technique on DES they discovered to their surprise that DES was not susceptible to this type of attack. The US National Security Agency had made changed to the design of DES before it was approved, since no public statement had ever been made as to why these changes were made there was always a thought that the NSA might have intentionally weakened DES for their own purposes. In 1990 it became clear that the reason for the changes was to protect the cipher from differential cryptanalysis. The NSA had known about the technique long before it was published to the academic world.
Another probabilistic attack, this time requiring a succession of binary addition modulo 2 (XOR) calculations between the plaintext and the ciphertext. This means that it is also a known-plaintext attack. DES proved not to be protected against this attack so it is reasonable to suppose that the NSA was not aware of this technique. Linear cryptanalysis is mathematically promising but has not been practical in any known applications.