Abstract Data Types (ADTs)


There are two key aspects of any program: what to operate on (the data), and how to operate on it (the functions). Now it's time to think more about the interplay between data and functions.

One powerful approach to dealing with the function-data relationship is to use Abstract Data Types (ADTs), in which we know at a high level what the data represent (but not how it is stored), and more importantly the operations on the data. (Usually the same operations work on a wide class of data types, so these should probably be called "abstract operation types", but it's too late to change the name now.) The idea is to hide the way that the data are represented. Before a paper by Parnas in the 70's, everyone assumed that the natural thing to do was for every function to know how the data were represented and to work with that representation. That's dangerous, since if we ever change the representation, then we have to go change every function that makes use of that representation.

With an ADT, we can perform certain operations supported by an interface, but we can't directly get at the implementation by which the data are represented and the operations supported. It's like an office with a window where a clerk sits and handles your requests to save or retrieve documents. You can't tell if the documents are saved in file cabinets, are scanned into a computer database, etc. There can be multiple implementations of the same ADT where each implementation implements the same functionality, but might use different data structures or algorithms to accomplish the required functionality. You can also change the implementation without affecting any code that only uses the interface.

So today we'll set up an ADT for representing lists. Java provides two List implementations, but soon we'll build up our own. Today we will start with a definition of what a List is and can do (the ADT), and then how an implementation can do that. In fact, over the next few classes we'll implement the list ADT in two very different ways, making efficiency trade-offs but supporting the same uses.

All the code files for today: SimpleList.java; ArrayListDemo.java; StudentTrackerAppList.java

Slides from class

Defining an ADT: inheritance vs. interfaces

We explored inheritance with Students, and have also used it in a number of other classes, e.g., GUI stuff. Recall that inheritance is used to say that a new subclass "is-a" superclass, plus more stuff / with more specific behaviors. Inheritance captures similarities among classes, in that we can use a subclass in any context where we can use its superclass (but not the other way around). It also propagates implementation details, as the subclass includes the instance variables and associated methods of the superclass.

A related, but different, way to capture similarities is with an interface. An interface just specifies a bunch of method headers (name, return type, parameters), but without bodies implementing them. So that's actually ideal for specifying an ADT. The interface gives the collection of operations that must be supported by the ADT, without specifying the data structure representation or algorithms. A very basic interface for lists: SimpleList.java. Java provides are more comprehensive List ADT, but we'll keep ours a little simpler for now. In an interface we say what a List must do, but not how it does it. In fact, we'll see two different ways how (among many others).

So why would we ever want an interface instead of inheritance? Inheritance can lead to some tricky and ambiguous scenarios. This is particularly true if a class inherits from multiple superclasses, which happen to have different instance variables with the same name or different implementations of a method with the same name. Which one is meant when? (especially if they're called from other objects.) For that reason, Java forbids multiple inheritance; a class can only extend one superclass. It can, however, implement multiple interfaces. There's no ambiguity there — the class provides its own implementation of each method, and doesn't rely on existing (potentially conflicting) implementations that it inherits.

As we saw with inheritance, we can declare a variable to be of the interface type, and not worry about which concrete implementation it is. An example using the List interface: ArrayListDemo.java. Note: you cannot "new" an interface, since it doesn't actually implement anything. You can only "new" a class that implements that interface.

The List ADT

Lists are used to store multiple items of a particular data type. With a List we refer to items by their index. The first item is at index 0, and the last item is at index size-1 (where size is the number of items in the List). Lists in Java are similar to Lists in Python. The closest analog in C is an array.

Unlike an array (in C or in Java), however, Lists do not specify the number of items they hold, and they can grow as items are added. For example, previously we might have declared an array to hold say 10 Student objects. If we needed to track an 11th student, we were out of luck — there was no more memory for the 11th student. With a List, we can easily add the 11th student. We will see how soon.

Our SimpleList interface specifies that the following methods must be implemented:

Notice that we don't care about what kind of data the List holds. Using get(idx) returns the item at index idx. The functionality does not change according what type of data the List holds. In Java, however, all items in the List must be of the same data type (in Python each List item can have a different data type). But, because an item of a subclass "is a" item of the base class, a List can hold objects declared as a subclass. See Generics below.

We will discuss throws Exception next lecture, but the short answer is that if something goes wrong, the List code can tell the caller that something went wrong, and the caller can decide how to handle it.

Finally, Java provides two implementation of the List ADT for us (note that Java's interface is more complex than our simple interface). The first is called an ArrayList which implements the required List functionality using an array to store data (growing the array if it fills). The second is called a LinkedList which implements the required List functionality using a linked list. Each implementation provides the same ADT functionality, but carry out the operations differently.

Generics: what does it hold

Our interface is generic. We've seen this on the user side with ArrayList, which could hold different things in different cases. So we'd declare, say, ArrayList<Student>. Now on the implementer side, when creating our own generic class/interface, we use a type variable that will consistently hold the type that gets instantiated.

public interface SimpleList<T> {
  ...
  public T get(int index) throws Exception;
  public void add(int idx, T item) throws Exception;
}

Here we have no idea what type of thing will go into our List, and don't really care, so we use the type variable T to stand for whatever someone might use. Thus if someone declares SimpleList<Student>, then T means Student, while for SimpleList<Integer> it means Integer. Just like any variable, it stands for something else, the somewhat odd thing is that the something else is a data type. Typically we name type variables with a single uppercase letter, often T for "type", but sometimes E for "element", or as we'll see later K and V for "key" and "value", and V and E for "vertex" and "edge".

In the body of the code, that type variable consistently means the same thing, which is what gives us type safety. So for SimpleList<Double>, T is always Double, meaning that's the only type we can add to the List, so we can count on exactly what we can do with the stuff in the List.

Java notes

interface
An interface specifies a set of methods, but no instance variables, and does not provide implementations for those methods. When a class "implement"s an interface, it must provide all the method implementations.