Syllabus
If all goes well, we will be studying the following topics, in more-or-less the order given below.
- The data stream model and its variants
- The Misra-Gries algorithm for finding frequent items
- Counting distinct elements: Tidemark algorithm (essentially, HyperLogLog) and BJKST hashing algorithm
- The Count-Min Sketch and the Count Sketch for point queries to a frequency vector
- Frequency moments and the Tug-of-War Sketch
- Dimension reduction and the Johnson-Lindenstrauss Lemma
- Lp-norm estimation using stable distributions
- Sparse recovery and L0-sampling
- Sublinear median-finding (more generally, selection)
- Coresets and (geo)metric problems on streams
- Clustering problems on streams
- Graph streams: basic problems; spanners for distance estimation
- Maximum matching
- Triangle counting
- Linear sketching for graphs
- Lower bounds: the connection with communication complexity
- Lower bounds: Reductions of increasing sophistication
Topics and Events, Day by Day
The table below will be updated throughout the term.
# | Date | Topics | References |
---|---|---|---|
* | Mo Mar 30 | Homework 0 out | |
1 | Tu Mar 31 | Introduction; the model; Misra-Gries | My notes: Units 0, 1 zoom recording, whiteboard |
2 | Th Apr 2 | Tidemark algorithm | My notes: Unit 2 zoom recording, whiteboard |
* | Su Apr 5 | Homework 0 due by 23:59 | |
3 | Tu Apr 7 | BJKST algorithm for distinct elements | My notes: Unit 3 zoom recording, whiteboard |
3x | (x-hr) We Apr 8 | Morris counter; Median-of-Means Improvement | My notes: Unit 4 Zoom recording |
4 | Th Apr 9 | Frequent items: Count-Min Sketch and CountSketch | My notes: Unit 5 zoom recording, whiteboard |
5 | Tu Apr 14 | Frequency moments and how to estimate them | My notes: Units 6, 7 zoom recording, whiteboard |
6 | Th Apr 16 | Norm estimation, stable distributions, dimension reduction | My notes: Unit 8 zoom recording, whiteboard |
* | Fr Apr 17 | Homework 1 due by 23:59 | |
7 | Tu Apr 21 | Sparse recovery and weight-based sampling, I | My notes: Unit 9 zoom recording, whiteboard |
8 | Th Apr 23 | Sparse recovery and weight-based sampling, II | My notes: Unit 10 zoom recording, whiteboard |
9 | Tu Apr 28 | Data streaming lower bounds, a first look | My notes: Unit 18 zoom recording, whiteboard |
* | We Apr 29 | Homework 2 due by 23:59 | |
10 | Th Apr 30 | Graph algorithms in a streaming setting | My notes: Unit 14 zoom recording, whiteboard |
11 | Tu May 5 | Graph algorithms: Distance estimation and spanners; Matchings | My notes: Unit 14, 15 zoom recording, whiteboard |
* | Tu May 5 | Project proposals due by 23:59 | |
11x | (x-hr) We May 6 | Max-weight matching analysis | My notes: Unit 15 Zoom recording, whiteboard |
12 | Th May 7 | Turnstile graph streams and graph sketching | My notes: Unit 16 zoom recording, whiteboard |
13 | Tu May 12 | Graph sketching: bipartiteness and MST weight | My notes: Unit 16 zoom recording, whiteboard |
* | We May 13 | Homework 3 due by 23:59 | |
14 | Th May 14 | Median and selection, Munro-Paterson | My notes: Unit 11 zoom recording, whiteboard |
15 | Tu May 19 | Clustering—I: coresets, Euclidean MEB | My notes: Unit 12 zoom recording, whiteboard |
16 | Th May 21 | Clustering—II | My notes: Unit 13 zoom recording, whiteboard |
17 | Tu May 26 | Further lower bounds | My notes: Unit 19 zoom recording, whiteboard |
17x | Tu May 26 | Triangle counting: an introduction | My notes: Unit 17 zoom recording, whiteboard |
18 | Th May 28 | Final project presentations—I | See projects tab zoom recording |
* | Fr May 29 | Homework 4 due by 23:59 for those presenting projects on Jun 2 | |
* | Mo Jun 1 | Homework 4 due by 23:59 for those who presented their projects on May 28 | |
19 | Tu Jun 2 | Final project presentations—II | See projects tab |