CS 4, Summer 2006: Lecture 21: Cryptography

Introduction

In the electronic world, especially since the rise of the global Internet, there is an enormous need to transmit information across the network in such a way that only certain people can read it. In the physical world, we do this by putting our messages into envelopes, so that you can't see what's inside. In the electronic world, how do you take a message -- which is basically a sequence of bits -- and make it so it can't be seen without "opening an envelope"? This is the subject of cryptography.

Basic Terminology for Cryptography

Alphabet table:

   A   B   C   D   E   F   G   H   I   J   K   L   M
   0   1   2   3   4   5   6   7   8   9   10  11  12

   N   O   P   Q   R   S   T   U   V   W   X   Y   Z
   13  14  15  16  17  18  19  20  21  22  23  24  25

Simple example: Caesar Cipher

Example: Encrypt "DARTMOUTH"

      D  A  R  T  M  O  U  T  H
      3  0  17 19 12 14 20 19 7
+ 3   6  3  20 22 15 17 23 22 10
--------------------------------
      G  D  U  W  P  R  X  W  K

Now decrypt "GDUWPRXWK"

      G  D  U  W  P  R  X  W  K
+ 23  29 26 43 45 38 40 46 45 33
% 26  3  0  17 19 12 14 20 19 7
--------------------------------
      D  A  R  T  M  O  U  T  H

A problem: if somebody knows you used the Caesar cipher, they can just reverse the algorithm themselves. A solution is to use a secret key -- a piece of information known only to the sender and recipient of the message. Instead of adding 3 to the plaintext alue, we could add any number between 0 and 25 -- that's the secret key k. This is called a shift cipher. New rules:

Kerckhoff's Principle: Assume everybody knows the algorithm, but that only the sender and recipient know the key. This way, the secrecy of the message does not depend on keeping the algorithm secret.

With this simple cipher, it would still be trivial for someone to just try all the possible keys until they find the one that gives them a meaningful message. Worse, notice that each time a plaintext character is encrypted, it is replaced by the same ciphertext character (both "T"s in DARTMOUTH become "W"s). This makes it easier to crack the code, because plaintext words with the same letter in multiple positions will generate ciphertext words with the same property.

So, the Caesar cipher isn't the greatest, but it illustrates some important ideas:

Modern algorithms are more complex than this, and combine substitution and permutation in sophisticated ways. They also use more possible keys; instead of 0-25, keys are typically 64-bit, 80-bit, 128-bit values. With more possible values, it's hard for someone to simply loop over all possible keys, trying to guess the right one (a "brute force" attack). The "strength" of an encryption algorithm is often rated by the size of the key ("128 bit encryption"). This is somewhat misleading, since it assumes there is no better way to solve the algorithm (which is hard to prove) and different algorithms have different properties. Some keys can be "weak" (example: Shift cipher with k = 0).

The Key Distribution Problem

When you encrypt a message with a key k and send it to your friend, you also somehow need to get the key to your friend so (s)he can decrypt the message. How? In person (are they nearby)? E-mail (how do you make it safe)? Phone (is it tapped)?

If you really need to use encryption, you have to have some way of making sure everybody has a copy of the secret key. This is called the key distribution problem. There are a number of ways to solve it, most of them fairly cumbersome (especially given the need to change keys, deal with lost keys, etc.).

In 1976, Whitfield Diffie and Martin Hellman proposed a novel idea: use two distinct but related keys, k1 and k2. A message encrypted with k1 can only be decrypted with k2. Rough analogy: A safe with a thin slot on the top locked with a padlock (k1), and a door with a combination lock (k2). If you have access to the slot, you can put a message into the safe. But, you can't get it back out without knowing the combination to the main door.

This idea is called public key encryption, or sometimes assymetric encryption. Each person has two keys: one that is kept secret (the private key), and one that is distributed to anyone and everyone (the public key). To send a message to Fred, you get a copy of Fred's public key, and use it to encrypt a message. Fred has the corresponding secret key, and can read the message; but nobody else can.

One great feature of this approach: for n people, you now only need 2*n keys total: Each of the n people has a public and a private key. In other approaches, you need a key for each pair of people, for a total of (n * (n - 1))/2 = (n2 - n)/2 secret keys. Another feature: two people don't have to arrange in advance to have secret communications. They can just obtain each other's public keys, which are not secret.

Diffie-Hellman

What's the secret behind Diffie-Hellman? A mathematical means by which it is possible for two parties to exchange secret keys securely, even if someone else is watching. (This is not, by itself, encryption, but it formed the basis for a number of other algorithms later.)

A little math....

It's not too hard to compute ordinary logarithms (it can be done fairly quickly with a computer). The discrete logarithm problem, however, can be quite difficult, unless you know some properties about m. Because of the "wrap-around" effect of the modular arithmetic, raising a number to larger and larger powers makes it bounce all around, sometimes larger, sometimes smaller (recall the definition of a generator). If m is very large, solving (xy % m) for y takes far longer than actually computing (xy % m) itself. This difficulty allows us to tell people exactly what we're doing, and still be confident that they won't discover our secret (Kerckhoff's principle).

The Diffie-Hellman Algorithm:

The two values computed by Alice and Bob are equal! But an observer who does not know the values of a and b cannot reproduce this value, even though they know all the other values: p, g, A, and B. (And since g is a generator, A and B could be any number -- there's no bias.)

An important thing to note about this algorithm is that it does not encrypt data. It simply gives Alice and Bob a way to compute a value which they alone know. But that is an important step, since that is exactly the problem we need to solve in order for Alice and Bob to use encryption! In other words, this helps solve the key-distribution problem.

RSA (Rivest, Shamir, and Adleman, 1978)

This was the first real public-key algorithm. It is based on some of the same ideas as the Diffie-Hellman algorithm, but it allows the user to encrypt and decrypt data, not just exchange keys.

Algorithm:

What's happening here:

So with RSA, we represent messages as integers in the range 0 .. n-1, and encrypt and decrypt using modular exponentiation. If you know e, and you want to find d, it's easy. There's a simple algorithm given e and m to find d such that (e * d) % m = 1. But that means you have to know m = (p - 1) * (q - 1). To do that, you have to factor n into prime factors. We don't know any fast way to do that for large values. With current technology, special-purpose supercomputers can factor a modulus of up to 174 decimal digits, about 576 bits (in 2003). At the current state of the art, typical RSA key sizes are 1024-2048 bits (308-616 decimal digits), and would take hundreds to thousands of years to factor using currently-available technology (note that each additional digit adds a multiple of ten to the amount of work it takes to factor). Once again, we can tell everyone the algorithm, but the difficult computation keeps our secret safe.

Crypto applications

Some examples:

Unfortunately there's a lot of bad encryption out there, and it can be hard to know what's good and what's bad. Cryptography is a black art, and we don't know much. Received wisdom: don't depend on your algorithm being secret. Use well-studied and well-established algorithms, not snake oil. The only unbreakable system is the one-time pad -- a random key as long as the plaintext, known to both parties, and used only once.

Digital Signatures

Alice and Bob are getting married, and Alice has written up a very clever prenuptial agreement. She wants to send it to Bob, who will show it to his lawyer. She wants to make sure the contract Bob receives is the same one she wrote, and that Bob can't change it before he gives it to his lawyer. The idea: