The RSA public key cryptography system allows anyone to send coded messages using a public key. Only the owner who possesses the secret decoding key can decode the message. This entry discusses the system and gives methods of implementing it.
Most of us played around with cryptography when we were young, in the form of simple codes1. These were usually of the "add 1 to each letter" variety, so that OPEN THE POD BAY DOOR HAL became PQFO UIF QPE CBZ EPPS IBM. Of course, adding 1 to Z wraps around to give A.
More sophisticated codes, more accurately called cryptography systems, involve the use of a key. For example, using the key (11,23,2,18), we add 11 to the first letter, 23 to the second, 2 to the third and 18 to the fourth letter. We then start again, adding 11 to the 5th letter, 23 to the 6th and so on. This is much harder to decrypt, unless you know the key, in which case it is a doddle.
It is possible to devise cryptography systems based on a key which are effectively impossible to decrypt without knowing the key. Two cooperating banks, for example, may want to send secret details of money transfers. They agree on a key and use it to encrypt and decrypt all messages. If more banks are involved, the situation gets more complicated. Each pair of banks must have a secret key known only to the pair. For ten banks, this amounts to 45 separate keys to be kept track of.
A problem with the traditional keyed cryptography system is that both sender and receiver need to know the key. At the time the key is agreed upon, it must be transmitted from one to the other. This means that the key may be intercepted and discovered by a rival. The rival can then decode any message which they intercept and can also send fake messages which the receiving bank will assume must have come from the sending bank.
The Public Key System
Great steps forward in cryptography were made with the invention of the Public Key Cryptography system. This was first proposed in 1976 by Diffie and Hellman. It was implemented in 1978 by Rivest, Shamir and Adleman. It has been known ever since as the RSA system.
In the public key system, each user of the system who wants to receive messages has two keys:
One key, which is public, is used to encrypt the message.
The other key, which is known only to the user, is used to decrypt the message.
You can't decrypt the message without the decryption key.
You can't work out the decryption key from the encryption key.
If Bank B wants to send a message to Bank A, it uses Bank A's public encryption key to convert the message into code. It then transmits the message to Bank A. Bank A uses its secret decryption key to decrypt the message.
In this way, anyone can send a message to Bank A which only Bank A is able to read. Even the sender can't decrypt the message which he is sending.
Similarly, if Bank A wants to send a message back to Bank B, it uses Bank B's public key.
What makes it secure?
The security of the RSA public key system is based on the fact that it is easy to multiply two numbers together but it is not easy to produce the original number given the product. For example, it is easy to check on paper that 373 × 419 = 156,287. However, if you are given 156,287 and asked to find the original two numbers, known as factors, it will take you quite a while, even with a calculator.
Of course, a computer will break up a number such as 156,287 very quickly. But this is a problem which becomes very difficult as the numbers get bigger. Even a computer takes a long time if the numbers are large. My PC took 6.2 seconds2 to find the factors of 94,828,619. Finding the factors of 100,000,980,001,501 took it so long that I gave up waiting. It is reckoned that if presented with a number of 200 digits, even the fastest computers in the world can't find the factors in less than a billion years.
The RSA system is based on this fact. Both the secret and the public key are calculated from two large numbers which only the owner knows. The public key is the product of these two numbers, accompanied by a small number. To crack the code, all you need to do is find the factors of the public key, but there is no known way of doing this in less than a billion years. At present the system appears to be secure. Perhaps tomorrow, some mathematician will discover a method for factoring large numbers.
How does it work in detail?
The RSA system requires you to do simple arithmetic to some very large numbers. Most programming languages impose a limit of about 10 digits on the size of numbers. If you want to handle numbers larger than this, you will have to store them in arrays of digits and write the algorithms for addition etc. yourself.
Another way is to use a programming language which can handle large numbers. The only two I know of are Scheme and Mathematica. I will use Scheme in my examples, but since it is a very peculiar-looking language, I will also give an explanation in a more readable form.
Encoding a message
The public key consists of a pair of numbers, e and n. The first number e is fairly small, for example 181, while the second number n is very large, for example 200 digits in length.
The message is converted into a number which must be less than n. This can be done by writing each letter as a three digit ASCII code and stringing them all together. Space = 032, A = 065, B = 066, C = 067 etc. The message should be broken into chunks so that the numbers never exceed n. In this code, 'HELLO' is represented by the number 72,069,076,076,079.
Let m represent the message number. To encrypt this number, calculate:
c = me (mod n)
That is, raise m to the power of e, and calculate the remainder after dividing this number by n. This number c is the encrypted version of the message.
It may look like a formidable task to be raising something to the power of 181, but there is in fact an algorithm for doing it quickly, which I will discuss later. Just as well, because the decryption process involves raising a number to much larger powers than this.
An example of encryption
Note: to avoid confusion, I will write large strings of digits without the normal commas after every three digits.
In this simplified example, we'll choose a public key of (e=3, n=141553). This is small simply for the purpose of explanation. Suppose we want to send the message HELLO. First, we write it as numbers:
072 069 076 076 079
If we group it in six-digit chunks, no group will be greater than 141553, the 2nd public key number. We now have three numbers:
72069, 76076, 79
Next we calculate each of these to the power of 3. A little bit of work with pen and paper reveals:
374322116704509, 440294245366976, 493039
Now we calculate the remainder after division of each of these by 141553:
68350, 117243, 68380
These three numbers are the coded version of the original message. We can bulk each of them up to six digits by adding a leading zero where necessary and write them as a string of digits:
The only way of decrypting this is by knowledge of the secret decryption key. In this case, it can be calculated easily by factoring the public key number 141553 but in real codes this is impossible due to the size of the numbers.
Decryption of the message
As we have seen above, the public key consists of a pair of numbers (e,n). Encryption involves calculating:
c = me mod n
The secret decryption key consists of another pair of numbers (d, n) where the n is the same as in the public key. The decryption process is exactly the same as the encryption process but in reverse:
m = cd mod n
In our example, the decryption key is (93867, 141553). To decrypt the code, we divide it back up into six-digit numbers:
68350, 117243, 68380
Next we raise it to the power of 92,867 and get the remainder modulo 141,553. My computer quickly tells me this is:
72069, 76076, 79
Finally, we regroup this, then translate it back into letters:
072 069 076 076 079
H E L L O
Construction of the private and public keys
First you must think up two large numbers which themselves have no factors. Such numbers are called prime numbers. In practice, you should use very large numbers of 100 digits each. The best way to do this is to pick random 100-digit numbers and test them to see are they prime. It is not easy to test a number for prime-ness (primality). Even if you are using Mathematica, which has a prime test built in to it, it can still take a long time to check a 100-digit number; the fastest algorithm in the world running on the fastest supercomputer takes about a minute.
There is, however, a simple algorithm which can test if a number is probably prime. If it passes this test, then the chances are it is prime. If it fails the test, then it is definitely not prime. This is good enough unless you are the US Department of Defense. I give details of this in Appendix 1.
When you have got your two large prime numbers, call them p and q. Calculate:
n = pq
φ = (p-1)(q-1)
Pick e to be any number between 1 and φ which shares no common factors with p-1 and q-1. If you pick e to be a small prime number, all you have to do is check if it divides evenly into either p-1 or q-1. If it does, pick again until you find one that doesn't.
Calculate d to be the modular inverse of e to base φ. That is, d is a number between 1 and φ such that de = 1 mod φ. There is always such a number and an algorithm for calculating it is given in Appendix 2.
The public key is then (e,n) while the private key is (d,n).
In the next section, I give some algorithms for doing the necessary calculations.
Appendix 1: Algorithm for raising a to power of b modulo c
Use the notation mod(n,c) to mean the remainder after dividing n by c. We want to calculate mod(ab,c), where a, b and c are all 100-digit numbers.
There are two potential problems involved in raising a number to a very large power:
- The traditional method of multiplying takes a very long time.
- The intermediate results are very large.
Taking the first of these, suppose we want to calculate a7. The normal algorithm would be something like this, in Pseudo-BASIC:
PROD = 1
FOR I = 1 TO 7
PROD = PROD * A
Notice that to calculate something to the power of 7, seven multiplications are used. This is acceptable for small powers but won't work for ab when b is a number with 100 digits. The calculation would take "almost forever".
We can design a neat algorithm using the following two facts:
ab = (ab/2)2 when b is even
In words, a to the power of an even number b is equal to the square of (a to the power of half of b).
ab = (ab-1)*a when b odd
In words, a to the power of an odd number b is equal to a times (a to the power of one less than b).
We get an algorithm which looks something like the following, in Pseudo-BASIC:
DEFINE FUNCTION POWER(A,B)
IF B=1 THEN
ELSE IF ODD(B) THEN
END DEFINE FUNCTION
We now come to the second problem. If we try to calculate ab and then find the remainder modulo c, we find that ab is so big that there is no computer in the universe big enough to hold it.
If we take the remainder modulo c as we go along, however, the numbers never get too big to handle. Each time we do a multiplication or a squaring, we take the remainder modulo c. The answer is less than c and the intermediate results are at all times less than c2.
This gives us the final algorithm in Pseudo-BASIC as follows:
DEFINE FUNCTION POWERMOD(A,B,C)
! POWERMOD(A,B,C) calculates (A to power of B) modulo C
IF B=1 THEN
ELSE IF ODD(B)
END DEFINE FUNCTION
This is expressed in Scheme as follows:
(define (square x) (* x x))
(define (powermod a b c)
(if (= b 1)
(modulo a c)
(if (odd? b)
(modulo (* a (powermod a (-1+ b) c)))
(modulo (square (powermod a (/ b 2) c)) c)
Appendix 2: Algorithm for calculating modular inverse of a number
I've developed this algorithm based on the standard method for solving Diophantine Equations. I haven't figured out yet whether the algorithm is efficient. This is important if it is to be used for 200-digit numbers.
From memory - this has not yet been checked:
define function invmod(e,m)
if e+1 = m then
a = m
a = e
i = 1
i = i + 1
d = a[i-2] mod a[i-1]
a[i] = d
until d = 1
c = 1
for j=i to 1 step -1
if odd(j) then
c = (c * a[j] + 1)/a[j-1]
c = (c * a[j] - 1)/a[j-1]
c = c mod m
Appendix 3: Algorithm for testing a number for probable primality
To test a number p for probable primality:
- Is p mod 2 ≠ 0 ?
- Is p mod 3 ≠ 0 ?
- Is p mod 5 ≠ 0 ?
- Is p mod 7 ≠ 0 ?
- Is 2p mod p = 2 ?
- Is 3p mod p = 3 ?
- Is 5p mod p = 5 ?
- Is 7p mod p = 7 ?
If the answer to all these questions is yes, then the number is probably prime. If the answer to any question is no, then the number p is definitely not prime.
The following Scheme procedure does all this. It uses the POWERMOD procedure defined above.
(define (probable-prime? n)
(not (zero? (remainder n 2)))
(not (zero? (remainder n 3)))
(not (zero? (remainder n 5)))
(not (zero? (remainder n 7)))
(= (powermod 2 n n) 2)
(= (powermod 3 n n) 3)
(= (powermod 5 n n) 5)
(= (powermod 7 n n) 7)