Algorithm: An ordered collection of unambiguous and effectively computable operations that, when followed produces an observable result, and completes (halts) in a finite amount of time.
A couple of examples: making a sandwich, computing sums.
Instructions a computer understands:
If test then something else another something
While test do something End of loop
The point is to define a set of simple operations that we agree are both "unambiguous" and "effectively computable", so that we can write algorithms with them.
Suppose you're given a telephone directory, which is a list of names and corresponding phone numbers. How do you look up "Smith, John" in the directory? How do you write a program to do that?
First off, we need a way of describing a telephone directory. In the real world, it's a book, on whose pages are printed a (very long) list of names and telephone numbers. The important idea is that we need to talk about an ordered sequence of values.
A typical way to represent an ordered sequence is called an
array. Like an individual value, an array is stored in the
computer's memory, but the array holds multiple values. You can think
of an ordinary variable as being a named "box" containing a value; an
array can be thought of as a "row of boxes", containing a list of
values. To refer to an array in an algorithm, we'll give it a name,
like a variable. We'll write square brackets [] after the name to
show that it's an array variable instead of an ordinary variable:
A[] is an array named A.
Since an array is ordered, we need some way to tell which value in the array is the first, which is the second, etc. To do this, we give each box in the array a unique number: 0, 1, 2, etc. To refer to some element of an array, we write the number of that element in square brackets after the array name:
A[0]-- | the first element of the array (the one numbered 0) |
A[1]-- | the second element of the array (the one numbered 1) |
| etc. |
The elements of an array are just like variables, in that they can contain values. In terms of our language of algorithms, this means we can write things like:
A[0] to 25A[7]A[1] > A[2] then ...We'll assume that we know how many elements are in an array (i.e.,
how many positions are available). We can write A.length
to mean "The number of elements in A[]".
Note: If an array has n elements, the positions are 0, 1, ..., n-1. The textbook uses positions 1, 2, ..., n (ignoring position 0).
With arrays available, one way we can represent our telephone directory is with two arrays:
N[] -- an array of names (the people in the directory)T[] -- an array of telephone numbersEach of these arrays has the same number of entries, and for any
given entry in array N[], the corresponding entry in
T[] gives the telephone number for that person. For now,
we'll assume that the directory is in some random order; we'll see
later how to take advantage of alphabetical order, and next time
how to put it in alphabetical order. For example:
i | N[i] | T[i] |
| 0 | Bob | 555-9876 |
| 1 | Evan | 555-1234 |
| 2 | Alice | 555-1234 |
| 3 | Dan | 555-0000 |
| 4 | Carla | 555-5555 |
Example: greeting everyone in the phone book.
We want to design an algorithm Search, that takes as
input arrays N[] and T[] representing a
telephone directory, along with a name who to look up.
If who is in the directory, the algorithm should print
their telephone number; otherwise, it should print "not found".
How do we do this? The basic idea is as follows:
found which is "True" if we have
found the name we are looking for, and "False" if not.
to Search given N[], T[], who:
1. Set counter to 0
2. Set found to False
3. While (found = False) and (counter < N.length) do
4. If who = N[counter] then
5 Set found to True
6. Else
7. Add 1 to counter
End loop
8. If found = True then
9. Print T[counter] -- print out the phone number
10. Else
11. Print "not found"
12. Halt.
Example: Using the directory given above, what happens when you Search for "Alice" in the directory?
steps counter found N[counter] 1,2 0 False "Bob" 3,4,7 1 False "Evan" 3,4,7 2 False "Alice" 3,4,5 2 True "Alice" 3,8,9 2 True "Alice"
It prints T[counter] which is T[2] which
is "555-1234".
How about searching for "Barbara"?
steps counter found N[counter] 1,2 0 False "Bob" 3,4,7 1 False "Even" 3,4,7 2 False "Alice" 3,4,7 3 False "Dan" 3,4,7 4 False "Carla" 3,4,7 5 False (no entry) -- counter is now equal to N.length 3,10,11
It prints "not found" because found is False.
Now we know how to find a particular entry in an array. There are a lot of algorithms that have a very similar structure -- march through an array and "do something" -- find the largest element (or the smallest one), sum up the elements, count the number of elements satisfying some test, .... The book shows how to find the largest; see if you can think through the others.
It is also possible to store values into an array, in addition to reading them out. For example, we could select a "sub-directory" of phone numbers for people whose names begin with a particular letter.
to Select given N[], T[], letter:
1. Create arrays SomeN[] and SomeT[]
2. Set counter to 0
3. Set some to 0
4. While (counter < N.length) do
5. If N[counter] begins with letter then
6. Set SomeN[some] to N[counter]
7. Set SomeT[some] to T[counter]
8. Add 1 to some
9. Add 1 to counter
End loop
10. Halt and return SomeN[] and SomeT[].
The digit-wise addition of numbers, in the textbook, likewise stores into an output array the summed digits.
We have a search algorithm that works, regardless of how the names are ordered. But we can do better, since telephone directories are sorted in alphabetical order (how they get that way is a question for next time).
i | N[i] | T[i] |
| 0 | Alice | 555-1234 |
| 1 | Bob | 555-9876 |
| 2 | Carla | 555-5555 |
| 3 | Dan | 555-0000 |
| 4 | Evan | 555-1234 |
So, if you are looking for "Barbara" and you see "Bob", you know the entry for "Barbara" either had to be before "Bob", or it's not in the directory! We can take advantage of this with only a very small change to the original Search algorithm. Let's replace lines 6-7 with some different instructions:
to Search given N[], T[], who: 1. Set counter to 0 2. Set found to False 3. While (found = False) and (counter < N.length) do 4. If who = N[counter] then 5 Set found to True 6. Else 7. Add 1 to counter 6. Else if N[counter] > who then 6a. Set counter to N.length 6b. Else 7. Add 1 to counter End loop 8. If found = True then 9. Print T[counter] -- print out the phone number 10. Else 11. Print "not found" 12. Halt.
Intuitively, this captures the logical idea: "If you're searching from start to end, you haven't found the name you want, and you see a name that is after the name you want, you can give up, because the entries are in alphabetical order."
The algorithm for Search works, and is easy to understand, but it takes a lot of steps! Each time we go back to Step (3), the beginning of the While loop, we've progressed one step further into the list. We might end up taking as many steps as there are names. But since the list is in alphabetical order, we can do better:
This idea leads to a very clever algorithm called "binary search".
to BinarySearch given (N[], T[], who):
1. Set low to 0
2. Set high to (N.length - 1)
3. Set found to False
4. While (found = False) and (low ≤ high) do
-- Find the position of the "middle" element
5. Set mid to (low + high) / 2, discarding fractions.
6. If (N[mid] = who) then
7. Set found to True
8. Else if (who > N[mid]) then
9. Set low to (mid + 1)
10 Else
11 Set high to (mid - 1)
End loop.
12. If (found = True) then
13. print T[mid]
14. Else
15. print "not found"
16. Halt.
Now, observe a very important property of this algorithm: each time we test a name and it does not match the name we are looking for (who), we discard (approximately) half the remaining values that are to be searched. This is also the same reason that "20 questions" works -- each question helps you discard about half the possible answers.
How many times can we do this? If you keep dividing by 2, and discarding fractions, you will eventually get down to 1. The number of times you can do this is the base-2 logarithm of the number of entries you started with. log2(x) is smaller than x for all values of x > 1. This means, the number of times you will go through the While loop in this binary search algorithm is smaller than in the original algorithm. In fact, as the number of entries gets bigger, the difference will become huge!