Lists

[prev] [next]

There are two key aspects of any program: what to operate on (the data), and how to operate on it (the functions). Now it's time to think more about the interplay between data and functions.

One powerful approach to dealing with the function-data relationship is to use Abstract Data Types (ADTs), in which we know at a high level what the data represent (but not how), and more importantly the operations on the data. (Usually the same operations work on a wide class of data types, so these should probably be called "abstract operation types", but it's too late to change the name now.) The idea is to hide the way that the data are represented. Before a paper by Parnas in the 70's, everyone assumed that the natural thing to do was for every function to know how the data were represented and to work with that representation. That's dangerous, since if we ever change the representation, then we have to go change every function that makes use of that representation.

With an ADT, we can perform certain operations supported by an interface, but we can't directly get at the implementation by which the data are represented and the operations supported. It's like an office with a window where a clerk sits and handles your requests to save or retrieve documents. Can't tell if the documents are saved in file cabinets, are scanned into a computer database, etc. Can have multiple implementations of the same ADT. Can also change the implementation without affecting any code that only uses the interface.

So today we'll set up an ADT for representing lists. You've already used one Java list implementation, but we'll build up our own, starting with a definition of what a list is and can do (the ADT), and then how an implementation can do that. In fact, over today and then next class we'll implement the list ADT in two very different ways, making efficiency trade-offs but supporting the same uses.

Defining an ADT: inheritance vs. interface
Generics: what does it hold
Visibility: public vs. private vs. protected vs. package
Exceptions
Singly linked list implementation
Java notes

All the code files for today: ListExceptions.java; ListTest.java; SimpleList.java; SinglyLinked.java

Slides from class

Defining an ADT: inheritance vs. interfaces

We explored inheritance with animated agents, and have also used it in a number of other classes, e.g., GUI stuff. Recall that inheritance is used to say that a new subclass "is-a" superclass, plus more stuff / with more specific behaviors. Inheritance captures similarities among classes, in that we can use a subclass in any context where we can use its superclass (but not the other way around). It also propagates implementation details, as the subclass includes the instance variables and associated methods of the superclass.

A related, but different, way to capture similarities is with an interface. An interface just specifies a bunch of method headers (name, return type, parameters), but without bodies implementing them. So that's actually ideal for specifying an ADT. The interface gives the collection of operations supported by the ADT, without specifying the data representation. A very basic interface for lists: SimpleList.java. We say what a list can do, but not how it does it. In fact, we'll see two different ways how (among many others).

Why would we ever want inheritance instead of an interface? One good reason is if there is some common data or method implementations, as we had with agents — they all have a position as part of their state, and provide a method to draw themselves wherever they are. I could have instead just given an interface, leaving the implementation unspecified. Then the various subclasses would have the freedom to be represented however they wanted to, but at the same time would have the obligation to implement the operation themselves. For this case, that would have led to redundancies as they all work more or less the same way (unless they override that functionality); for a more complex case, the trade off might have been better the other way.

So why would we ever want an interface instead of inheritance? Inheritance can lead to some tricky and ambiguous scenarios. This is particularly true if a class inherits from multiple superclasses, which happen to have different instance variables with the same name or different implementations of a method with the same name. Which one is meant when? (Esp. if they're called from other objects.) For that reason, Java forbids multiple inheritance; a class can only extend one superclass. It can, however, implement multiple interfaces. There's no ambiguity there — the class provides its own implementation of each method, and doesn't rely on existing (potentially conflicting) implementations that it inherits.

As we saw with inheritance, we can declare a variable to be of the interface type, and not worry about which concrete implementation it is. An example using our list interface: ListTest.java. Note: you cannot "new" an interface, since it doesn't actually implement anything. You can only "new" a class that implements that interface.

Generics: what does it hold

Our interface is generic. We've seen this on the user side with ArrayList, which could hold different things in different cases. So we'd declare, say, ArrayList<Blob>. Now on the implementer side, when creating our own generic class/interface, we use a type variable that will consistently hold the type that gets instantiated.

public interface SimpleList<T> {
  ...
  public T get(int index) throws Exception;
  public void add(int idx, T item) throws Exception;
}

Here we have no idea what type of thing will go into our list, so we use the type variable T to stand for whatever someone might use. Thus if someone declares SimpleList<Blob>, then T means Blob, while for SimpleList<Point> it means Point. Just like any variable, it stands for something else, the somewhat odd thing is that the something else is a type. Typically we name type variables with a single uppercase letter, often T for "type", but sometimes E for "element", or as we'll see later K and V for "key" and "value", and V and E for "vertex" and "edge".

In the body of the code, that type variable consistently means the same thing, which is what gives us type safety. So for SimpleList<Blob>, T is always Blob, meaning that's all we can add and all we can get, so we can count on exactly what we can do with the stuff in the list.

Visibility: public vs. private vs. protected vs. package

We talked a bit about visibility of methods and instance variables. The ADT context is a good one in which to really consider the difference. Public access means that any method of any class can access the variable or method. Private access is limited to the given class and its inner classes. With protected access, the variable or method should be accessible to only the class and any subclass that extends the class (or extends a class that extends the class, etc.). Protected access seems to be a good compromise between public and private, and it is certainly an improvement on public.

There is a problem with protected access, however. Because anyone can extend a class, there is no real control over who can access protected instance variables. Again, there is no abstraction barrier between the superclass and its subclasses. A better way to restrict access is to make all instance variables (except constants) private and then to make the methods that can access those instance variables protected. That should prevent classes other than subclasses from accessing the instance variables, while still keeping encapsulation.

The default access, package access, is what you get if you don't put anything before the instance variable or method. I consider it a design flaw of the language that there is no package visibility level specifier, as there is for public, private, and protected. It is too easy to forget to declare an instance variable (or method) private. I would have preferred that leaving off the visibility level specifier would be a syntax error. That way the programmer would always have to make an explicit choice.

Package access makes the variable or method visible to everything in the same package. A package is a way of grouping methods together that are intended to work together as a group, like java.util or java.awt. To put a file into a package, you start the file with a package statement:

package yourPackageName;

If there is not a package statement, then all classes in the file go into the default package. (For consistency in debugging and grading, that's the standard for CS 10, though in bigger projects later, you'll probalby want to split things up into packages so that names don't interfere with each other.) You seldom want package access. Anyone can add a class to a package, so you have very little control of who has access to things that have package access. It sometimes makes sense to give package access for methods that you want as utility routines within a package, but it seldom makes sense to have package access for instance variables.

You may have noticed that when I described what protected means, I said it "should" only be available to the class and its descendents (subclasses or subclasses of those subclasses, etc.). Unfortunately, that is not what it means in Java. As we have described it, public is the least restrictive, private the most, and protected and package are both somewhere in between, but you can't say that one is more restrictive than the other. They control access in very different ways. The designers of Java decided that not having a linear hierarchy of protection levels was a bad idea. Therefore they somewhat arbitrarily decided that protected access should include package access, so that package access is more restrictive than protected. So in Java protected really means visible in descendents and to classes in the same package. If you are using the default package, that means it is not much different than public! However, I will still declare things protected to indicate who should have access.

Exceptions

You might have noticed in the SimpleList interface that a number of the methods indicate "throws Exception". We've pretty much ignored explicit exception handling so far (though no doubt you've gotten an exception at run time!), but this seems like a good time to discuss it. An exception indicates something happened at run-time that wasn't part of the "standard" flow, in that there's not a clear way to continue from this point. For example, if we ask for element -1 from a list, what should be returned? There's no good answer, so we throw an exception.

When an exception is thrown the program stops what it was doing and goes into error-handling mode. Exceptions are either caught and handled in the method that throws them or passed on to the method that called it. If that method does not catch the exception it is passed to the method that calls it, etc. If the main program does not catch the exception it kills the program and prints an error message. You probably have seen these messages for null pointers or subscript out of bounds.

Exceptions are an excellent way of passing an error on to a method that knows a reasonable way to handle them. Exceptions usually occur in a low-level method. Consider trying to open a file and failing. The file-opening method has no idea what the programmer wants to do if the file can't be opened. Maybe the method that calls it has no idea, either. If you don't know a good way to handle the exception it is best to pass it on. Eventually you will get to a high-level method that does know what to do about the situation. If the file name was just entered by the user it might be good to give her a second chance. Or it may be that the programmer has a backup file to use if the first cannot be opened. Or it could be that the right thing to do is to print an error message and quit the program. Exceptions give a way to let the low-level method that discovers the problem pass it on to a method that knows what to do about it.

In the example above, we simply passed the buck. And really, there's not much more we could intelligently do for such a simple example. But the simple example provides an opportunity to illustrate the mechanism for catching exceptions, so that then we can use that later (soon) when there is a reasonable alternative. The basic idea is to wrap a "try"/"catch" around code that might throw an exception. The "catch" specifies the type of exception it can handle (and a name for a parameter holding the exception data — it's just an object) and the code to perform when that exception is thrown. We can optionally follow the "try" with a "finally" that has code to be executed whether the code in the "try" block succeeds or fails. It is used to clean up. A typical example is an open file that should be closed, and we'll come back to that when we deal with files. For now, we typically won't need "finally".

If you look back at ListTest.java, you'll see that the main declares that it throws Exception, since it calls List methods that may. We know they won't, by reasoning about the code, but Java doesn't. If we want to "hide" that possibility, we have to wrap those calls in try-catch. See ListExceptions.java.

Note that throws Exception means a method may have an unexpected exit that should be handle, not that it necessarily will. But since it's a possibility, we have to be prepared for it.

To summarize, there are two possibilities when writing a method (here main) that invokes a method declaring that it throws Exception:

Declare that our method throws Exception, and may pass the buck
Wrap the potentially exception-throwing code in try-catch, and handle it

And actually, we could have a hybrid approach, where we try-catch, and if we don't like what we caught, we throw it on up the line.

When to go one one or another, and at what granularity? It really depends on whether we would know how to deal with the exceptional case(s). If so, wrap the try-catch just around the bit that can be handled, with an appropriate catch block for the case(s). If not, pass the buck.

We'll see in a bit how to throw our own exceptions from scratch.

Singly linked list implementation

An implements tag (analogous to extends) in a class definition is a promise to implement all the methods of the interface. Thus we can use an instance of that class wherever we declare a variable of the interface type (as in ListTest). The class may of course also include more methods beyond those of the interface, including some that only it defines, or that help it satisfy other interfaces. Let's actually implement two very different classes that meet the SimpleList interface definition. We will cover one version today, the other version next class.

You most likely saw linked lists in your earlier programming class, so I will just go over the simplest version here, SinglyLinked.java in order to demonstrate the idea and see how it implements the SimpleList.java interface. The book elaborates and has more powerful implementations (including doubly linked).

A linked list is made up of Elements, each of which holds one data item for the list, and points to the next item in the list. However, the next of the last element is null. (A doubly-linked list also points to the previous item.) Note that here the Element class is defined inside the SinglyLinked class — it's an inner class, defined within the context of a containing class. This is a reasonable thing to do for a class that nobody else really needs to know about. (It's also helpful in some cases to be able to take advantage of the fact that the inner object has access to the variables and methods of the outer object.) We could have put the whole class in a separate file, but since instances of that class are only used here, there's no reason to waste a whole file on it, and open it up to others. (Note that if you do put it in a file by itself, you'll need to make it explicitly generic there, Element<T>; here, the type variable comes along from the containing class.

The list itself then just points to the first element, the head. (Fancier implementations also keep track of the tail, but we don't need that to support the interface.) Our implementation also keeps the size, for efficient handling of the method of the same name.

(We indicate a null reference with a slash.)

To get item #idx, we advance down the list that many times, following the next pointers from the head to its next to its next, .... We then extract the item from that element. To set, we do the same thing but replace the item once we obtain the element. One tricky thing is the counting. We count from 0 as the head. So to get element #1, we need to advance once, which is exactly what the loop does — executes when n==1>0, then decrements n and terminates. In both cases, if we try to go past the end of the list (hit a null for next) we throw a friendly exception.

To add an element at an index, we need to advance one fewer step than that index, and splice in the new element as the next of that element. E.g. to add at index 1, we splice in a new element as the next of element 0 (and make the new element's next be the previous next of # 0). This gives us a special case of adding at index 0: make the new element the head and its next the old head.

To remove we do much the same thing, but splice out the removed element by setting the next of element idx-1 to the removed element's next.

Finally, we can make a printable string representation by iterating down the list, starting from the head and adding a string representation of each element's data along with a separator between the elements.

Java notes

interface: An interface specifies a set of methods, but no instance variables, and does not provide implementations for those methods. When a class "implement"s an interface, it must provide all the method implementations.
inner class: A class can be defined inside another class. This is useful, e.g., if it just supports the work of that class. The inner class then has access to the outer class variables and methods.
null: This indicates a non-existent object.
throws: This annotation on a function indicates that it can pass up an exception to its caller (which then must handle it or pass it on).
try/catch/finally: Exception-handling machinery is introduced in more detail in the body of the notes. This is the syntax for wrapping up some code that might throw an Exception.
System.err: Same idea as System.out except it's meant for error messages.