CS 35/135: Data Stream Algorithms [archived]
Spring 2020 | Professor: Amit Chakrabarti

Syllabus

If all goes well, we will be studying the following topics, in more-or-less the order given below.

  • The data stream model and its variants
  • The Misra-Gries algorithm for finding frequent items
  • Counting distinct elements: Tidemark algorithm (essentially, HyperLogLog) and BJKST hashing algorithm
  • The Count-Min Sketch and the Count Sketch for point queries to a frequency vector
  • Frequency moments and the Tug-of-War Sketch
  • Dimension reduction and the Johnson-Lindenstrauss Lemma
  • Lp-norm estimation using stable distributions
  • Sparse recovery and L0-sampling
  • Sublinear median-finding (more generally, selection)
  • Coresets and (geo)metric problems on streams
  • Clustering problems on streams
  • Graph streams: basic problems; spanners for distance estimation
  • Maximum matching
  • Triangle counting
  • Linear sketching for graphs
  • Lower bounds: the connection with communication complexity
  • Lower bounds: Reductions of increasing sophistication

Topics and Events, Day by Day

The table below will be updated throughout the term.

#DateTopicsReferences
*Mo Mar 30Homework 0 out  
 
1Tu Mar 31Introduction; the model; Misra-Gries My notes: Units 0, 1
zoom recording, whiteboard
2Th Apr 2Tidemark algorithm My notes: Unit 2
zoom recording, whiteboard
*Su Apr 5Homework 0 due by 23:59  
 
3Tu Apr 7BJKST algorithm for distinct elements My notes: Unit 3
zoom recording, whiteboard
3x(x-hr) We Apr 8Morris counter; Median-of-Means Improvement My notes: Unit 4
Zoom recording
4Th Apr 9Frequent items: Count-Min Sketch and CountSketch My notes: Unit 5
zoom recording, whiteboard
5Tu Apr 14Frequency moments and how to estimate them My notes: Units 6, 7
zoom recording, whiteboard
6Th Apr 16Norm estimation, stable distributions, dimension reduction My notes: Unit 8
zoom recording, whiteboard
*Fr Apr 17Homework 1 due by 23:59  
 
7Tu Apr 21Sparse recovery and weight-based sampling, I My notes: Unit 9
zoom recording, whiteboard
8Th Apr 23Sparse recovery and weight-based sampling, II My notes: Unit 10
zoom recording, whiteboard
9Tu Apr 28Data streaming lower bounds, a first look My notes: Unit 18
zoom recording, whiteboard
*We Apr 29Homework 2 due by 23:59  
 
10Th Apr 30Graph algorithms in a streaming setting My notes: Unit 14
zoom recording, whiteboard
11Tu May 5Graph algorithms: Distance estimation and spanners; Matchings My notes: Unit 14, 15
zoom recording, whiteboard
*Tu May 5Project proposals due by 23:59  
 
11x(x-hr) We May 6Max-weight matching analysis My notes: Unit 15
Zoom recording, whiteboard
12Th May 7Turnstile graph streams and graph sketching My notes: Unit 16
zoom recording, whiteboard
13Tu May 12Graph sketching: bipartiteness and MST weight My notes: Unit 16
zoom recording, whiteboard
*We May 13Homework 3 due by 23:59  
 
14Th May 14Median and selection, Munro-Paterson My notes: Unit 11
zoom recording, whiteboard
15Tu May 19Clustering—I: coresets, Euclidean MEB My notes: Unit 12
zoom recording, whiteboard
16Th May 21Clustering—II My notes: Unit 13
zoom recording, whiteboard
17Tu May 26Further lower bounds My notes: Unit 19
zoom recording, whiteboard
17xTu May 26Triangle counting: an introduction My notes: Unit 17
zoom recording, whiteboard
18Th May 28Final project presentations—I See projects tab
zoom recording
*Fr May 29Homework 4 due by 23:59 for those presenting projects on Jun 2  
 
*Mo Jun 1Homework 4 due by 23:59 for those who presented their projects on May 28  
 
19Tu Jun 2Final project presentations—II See projects tab
zoom recording