Project Guidelines
-
Please see class notes for project guidelines.
Important Dates:
- Project Proposal due at 11.59PM on Tuesday, May 17th, 2011. Please submit your proposal on blackboard
- Project update meeting during class time slots on May 25th and May 27th. I will meet each student/group once on one of the two days
- Final Project Presentation on June 1, 2011 during class. Everyone is required to attend.
Project Ideas
Feel free to propose a project that interests and excites you. Below are some project ideas alond with link to data sets that you can use if you want pursue these ideas
- Hand-written Digit Recognition
- MNIST Handwritten Digits [mnist_all.mat]
[training pictures: 0 1 2 3 4 5 6 7 8 9 ]
[testing pictures: 0 1 2 3 4 5 6 7 8 9 ]
8-bit grayscale images of "0" through "9"; about 6K training examples of each class; 1K test examples
- USPS Handwritten Digits [usps_all.mat]
[pictures: 0 1 2 3 4 5 6 7 8 9 ]
8-bit grayscale images of "0" through "9"; 1100 examples of each class.
- Spam Classification
- Email Dataset. Each email is a vector of 185 binary features. Most of these features indicate whether or not particular words occurred in the email or not; a few of them indicate things like capitalization, presence of attachments, etc. The complete list of features and what they represent can be found in the file features.txt. The label y takes on two values, one corresponding to ham and the other to spam. There are 1000 training cases and 4000 test cases.
- Enron Email Data set. This dataset contains features that are extracted from emails of 150 Enron employees after Enron went bankrupt. The dateset contains 12 mat files: 6 for spam and 6 for ham email messages. Each of the feature vector in this dataset contains number of occurances of specific words in an email. Words correspodning to the features can be found in the vocab.mat file, which can be located inside the above zip folder.
- Mind_Reading - predict when person was reading a sentence versus perceiving a picture
- This data set contains a time series of images of brain activation, measured using fMRI, with one image every 500 msec. During this time, human subjects performed 40 trials of a sentence-picture comparison task (reading a sentence, observing a picture, and determining whether the sentence correctly described the picture). Each of the 40 trials lasts approximately 30 seconds. Each image contains approximately 5,000 voxels (3D pixels), across a large portion of the brain. Data is available for 12 different human subjects.