Dear Digger - This was a pretty exciting week in class. We finally introduced noise into the situation - that is we began to discuss the possibility that messages might not arrive as sent. Implicit in this is the idea of a channel, which you can think of as the medium through which the message travels. The idea of a channel is pretty easy to abstract. We start off with a bunch of input messages and output messages as well as a likelihood a receiving a given message as output, assuming that a particular message was sent. The easiest example (that isn't silly) is the binary symmetric channel. The input and output messages here are the same, {0,1}, and we assume a probability p that the message is sent incorrectly (so "flipped") and thus 1-p that it is sent correctly. We also usually have a "prior" distribution on the input - a probability that the various inputs are sent. This allows us to also compute the "backwards" or "a posteriori" probabilities, which are the probabilities that a particular input is sent, assuming that a certain output is received. Since our input comes with a prior distribution, it has an associated entropy, or uncertainty, or information content. The different a posteriori distributions all have their own entropies as well, which we can think of as how the uncertainty in the input changes as we vary over what output is observed. Averaging over all of these uncertainties gives the average uncertainty in the input, given that we have observed the output. Since observation of the output should at worst tell us nothing about the input, the average a posteriori uncertainty is less than the a priori uncertainty, and the difference of the former from the latter, is called the "mutual information" of the input and output. This measures what the output can tell us about the input and is the same as what the input can tell us about the output! We can also interpret this as the information content of the channel, since however much we have lowered our uncertainty by oberving the output *must* be interpreted as the amount of information that the channel has produced. The mutual information depends on the prior distribution on the input. For example, if the input is certain (some message has probability one) then the mutual information is zero (no uncertainty anywhere). The maximum value that this can take over all priors is called the channel capacity. Interpreted as above, this is the most we can learn from our channel. The importance of the channel capacity is that it helps to measure how quickly and reliably we can send messages over our channel. In fact, it is the highest rate possible for which we can still hope to send messages as reliably as we desire. This is quantified by Shannon's noisy coding theorem, a result which explains that you can always send messages as accurately as you like without completely giving up on sending messages efficiently as well. So, we are in the situation in which we admit the possibility of an error in transmission. Really what this means is that we receive a message, and then have to guess what was sent. The most natural thing to do is guess the input which had the highest probability of producing the given output. This is conditional maximum likelihood decoding. I guess that there are other ways of choosing too, but that's for next week. Woof-woof to you, D.R.