Suppose you have an image file on your computer, and that file is taking up lots of disk space. So what would you like to do, to free up some disk space? You'd like to compress the file. That is, you want to change how you're representing that image in such a way so that a) the image doesn't look any different when using the new representation, and b) the new representation takes up less disk space.

Basically, there are two types of compression, lossless and lossy. When we say that a compression algorithm is "lossless," that means we can undo the algorithm and completely recover all of the original file. No information whatsoever is lost. GIF (Graphics Interchange Format) and PNG (Portable Network Graphics) are examples of lossless file formats. A compression algorithm is "lossy" means that we cannot completely recover the original file after it's been compressed. Some information has been lost. The trick with lossy compression is that if you are going to lose information, you should try to lose information that you wouldn't be able to perceive as missing. JPEG (Joint Photographic Experts Group) is an example of a lossy file format.

Taking Advantage of Spatial Redundancy

One way to perform lossless compression is to take advantage of spatial redundancy in the image. What does spatial redundancy mean? Look at the following image of the King:
The white spots/pixels on his right shoulder and black spots/pixels on his face are pretty obvious. They stand out quite clearly from the background. Why do they stand out? Because the pixels have gray-scale values that are very different from the pixels around them. So if I asked you to correct these "bad" pixels, you could probably do a decent job, because you would be able to use the redundant information provided by the "good" pixels in the neighborhood to fix the "bad" pixels. You would be taking advantage of the fact pixels tend to have intensities that are pretty close to their neighbors. (Note that I'm not saying this always happens. I'm only saying that it tends to happen.)

The Image Cross-correlogram

While we can intuitively grasp the concept of spatial redundancy from the above example, we should try to quantify it somehow, especially if we are going to base a compression scheme on it. After all, it won't do any good to use this scheme to compress an image where there is little or no spatial redundancy. So let's make the image cross-correlogram. To begin, let's put some axes on the King's image (which happens to have 245 rows and 226 columns):

Now, there are 256 shades of gray in the King, ranging from 0 (pure black) to 255 (pure white). Therefore, the image cross-correlogram will have 256 rows and 256 columns (if the image had only 8 shades of gray, the image cross-correlogram would have only 8 rows and 8 columns), and it will start off by having all 0s in its entries. For each pixel in the King's image, I'm going to record that pixel's intensity and the intensity of the pixel right above it. That is, in the image cross-correlogram, I will measure the intensity p(x,y) of a pixel at (x,y) (in the original image) on the horizontal axis of the image cross-correlogram, and the intensity p(x,y+1) of its upstairs neighbor on the vertical axis.

For example, suppose the pixel located at x=100, y = 82 in the King has gray-scale value 208. Then p(100, 82) = 208. Furthermore, suppose the intensity of the pixel directly above it is 190, i.e. p(100, 83) = 190. Then in the image cross-correlogram array, I will increment by 1 the entry located at (208, 190).

When I do this for every pixel in the King's image, what the image cross-correlogram will tell me will be how many times (or the frequency) a pixel of intensity A occurs directly below a pixel of intensity B in the King. For example, suppose I'm interested in how many times a pixel of intensity 108 was located directly below a pixel of intensity 132 in the King. Well, I'd look at the number located at x=108, y=132 in the image cross-correlogram. If that number was, say 731, then a pixel of intensity 108 was located directly below a pixel of intensity 132 exactly 731 times.

Here's the image cross-correlogram for the King:

Notice how things seem to be clustered along the main diagonal. What does this imply about the original image? Remember, the horizontal axis in the image cross-correlogram corresponds to the intensity of a pixel p(x,y), and the vertical axis corresponds to the intensity of a pixel directly above it, p(x,y+1). The clustering tells me that pixels of a given intensity tend to occur near pixels of roughly the same intensity. I.e. pick a pixel at random in the original image of the King. Suppose it has gray-scale value 104. The clustering along the diagonal tells me that chances are good that the pixel directly above the one I chose at random has a gray-scale value that's close to 104.

Ok, we're almost at the point when we can actually compress the King.

By examining the image cross-correlogram, I think we can agree that adjacent pixels tend to have the same gray-scale values. Another way of saying this would be to state that the difference in gray-scale values between a pixel and its neighbor will probably be small. Remember, the main diagonal in the cross-correlogram tells me that there's lots of spatial redundancy in the original image. I want to use this redundancy to compress.

A Look at the Differences: Pixels 'n' Bits

We will make one more histogram. This time we will record the difference in intensity between a pixel and its neighbor directly above it: diff = p(x,y+1) - p(x,y). Now, if at a particular location (x,y), I know the intensity p(x,y) of that pixel, and the difference in intensity between it and the pixel directly above it, I can recover exactly the upper pixel's intensity: p(x,y) + diff = p(x,y) + ( p(x,y+1) - p(x,y) ) = p(x,y+1). So I have lossless compression.

Before we get to the picture ... A pixel will have an intensity between 0 and 255. Therefore, the largest difference there can be between the intensity of a pixel and its upstairs neighbor will be 255 - 0 = 255. Also, the smallest difference there can be is 0 - 255 = -255. Note that I'm not talking about the absolute value of the differences. If I were, I wouldn't know if the upper pixel, (x,y+1) were brighter or darker than the pixel at (x,y). I.e. I wouldn't know if p(x,y+1) = p(x,y) + diff or p(x,y+1)= p(x+y) - diff. So I need to know if the difference is greater or less than 0.

Now to represent a number between 0 and 255, I need 8 bits. That means, to represent numbers between -255 and 255, I need 9 bits (not 16 bits). So it looks as though I've made matters worse by storing the differences. I need more bits now (storing differences) than before (just storing plain ol' pixel intensities). Right? Maybe not, it turns out! Let's look at the histogram of the differences p(x,y+1)-p(x,y):

Looks like that most of the differences are between, say -31 and 31. Ok, how many bits do we need to represent a number between -31 and 31? Since you need 5 bits to represent a number between 0 and 31, you need 6 bits between to represent a number between -31 and 31. And 6 bits is fewer than 8 bits, and definitely fewer than 9 bits. And we use this for our lossless compression. Even though we need to record some differences using 9 bits, this last histogram tells us that most of the differences are small - we need only 6 bits for those. Therefore, the extra bits we need to store these large differences are more than made up by the vastly greater number of bits we save when recording the small differences, where we need only 6 bits for each of those. And there's our lossless compression of the King - all the great taste, and less filling than the old King.

A Specific Example

In case things are still a little cloudy, let me do a short example that illustrates the "saving the bits" phenomenon. Suppose we have the following 4-by-4 array of pixel intensities:

3    200 210 198 179
2 169 160 197 190
1 160 145 190 180
0 205 150 195 150
y/x 0 1 2 3

The red numbers refer to the x (horizontal) and y (vertical) coordinates of the intensities.

As it stands now, since the above 16 intensities are all between 0 and 255, to record all 16 intensities I need a grand total of 16 pixels x 8 bits/pixel = 128 bits. (For this example, I'm ignoring the bits needed to record the locations, i.e. where does intensity 197 occur?) Let me make a table of the differences of pixel intensities:

x   y   Pixel Intensity
0 0
0 2
1 0
1 2
2 0
2 2
3 0
3 2

To record the intensities of the eight pixels in the third column (i.e. half the total number of pixels in the 4-by-4 array), we need a total 8 pixels x 8 bits/pixel = 64 bits. (Since these intensities are all greater than 127 and less than 256, I need 8 bits for each.) Now I need to store the differences (in order to recover the other eight pixels). How many bits do I need for these?

Of the eight differences, six of them are numbers between -31 and 31. Therefore, I need at most 6 bits to record each. For the other two differences, -45 and 50, I need 7 bits. So how many bits total do I need?

(6 differences x 6 bits/difference) + (2 differences x 7 bits/difference) = 36 bits + 14 bits = 50 bits.

Therefore, by using this "difference" representation of this array, the total numbers of bits I need in order to recover every pixel intensity is

(64 bits for pixels) + (50 bits for differences) = 114 bits.

So I need fewer bits, 114, in the "difference" representation than I do in the original representation (128 bits). And so I have my lossless compression! And for a few of the differences, e.g. 1 and -5, I could squeeze a little more savings by using fewer (how many?) bits to record those.

Admittedly, the above example isn't very dramatic, but it does illustrate the point that you can use spatial redundancy to compress an image.

Now remember that all this was based on the spatial redundancy of the original image. If there wasn't much redundancy, i.e. if the white dots in the cross-correlogram were not clustered along the diagonal but rather spread out, then this compression scheme wouldn't work very well.