Course Description
This course studies algorithms that process massive amounts of data; so massive that they will not fit in a computer's storage. As we shall see, this forces one to rethink even very basic algorithmic problems, such as counting the number of distinct elements, selection, and sorting. We shall study techniques to summarize such large amounts of data into succinct "sketches" that nevertheless retain important and useful information. We shall introduce the necessary mathematical tools along the way. Most techniques we shall study come from 21st century research, although a few date as far back as the 1970s. The techniques we shall study have been applied successfully to web data compression, approximate query processing in databases, network measurement and signal processing. The course itself will focus on the underlying techniques rather than the specific applications.
Prerequisites
This course requires a good background in mathematics (especially beginning probability theory and linear algebra) and exposure to formal analysis of algorithms (Dartmouth's CS 31 or an equivalent course). Preparedness will be assessed in Week 1 using a diagnostic test; see the schedule for details.
According to the ORC, the prerequisites are "COSC 31 or permission of the instructor."
Course Staff
Professor: Amit Chakrabarti
Teaching Assistants: Prantar Ghosh, Manuel Stoeckl
Textbook
There is no formal textbook for this course. My own evolving lecture notes cover most of the material and should be used in place of a textbook at least for the first half of term. As we get to advanced topics in the second half, we will sometimes study algorithms and theorems that are best described in research papers, to which I will provide pointers from the schedule.
There is a good survey by Muthukrishnan, somewhat dated at this point, but still very useful for the basics.
Work and Assessment
The assessed work in the course consists of solving several homework problems, spaced out throughout the term, and a final project. For details, please see the assessment tab.
There are no exams.