Collaborative Filtering Algorithm Applied to MovieLens DataIntroductionMany e-commerce applications use recommender systems to predict the products that existing or new users might like. The predicting factor might be based on the ratings given in a database or/and other side information. To make such predictions two different well-known approaches are: content-based and collaborative filtering. In content-based filtering new items are recommended based on the similarities (e.g. similar characteristics or attributes) that the new item has on the previous favorite items by the user. On the other hand, collaborative filtering gives predictions based on what the user previously liked only i.e. if a user, x gives high rating to item i1 and i2 (where high rating means like very much), it's likely that another user u who gave high rating to item i1 would also do so on item i2. Problem DefinitionWe have a matrix consisting of a list of , n, users {u_1,u_2,...,u_n} and a list of, m, items {i_1,i_2,...,i_m} with indices representing ratings given by a user. The aim is to predict the ratings of the unobserved data based on the prior preferences. Note that the matrix is usually very sparse. AlgorithmsSeveral state-of-the art algorithms are listed below:
Project GoalThe goal of this project would be to implement some of the matrix factorization algorithms with focus being on a variation of svd (incremental one). To get an appreciation of the different work done in this field, the aim would be to start with basics and increase the difficulty/complexity as more understanding is acquired. DatasetMovieLens data would be used [3]. This data has been collected by the GroupLens Research group at University of Minnesota from 1997 to 1998. There are three different versions each one larger than the other and the one chosen has 943 users, 1682 movies and 100,000 ratings (range from 1 to 5). It does not consist of users who have provided less than 20 ratings. The extracted file consists of U1.base and U1.test split in the ratio of 4:1 and would be used in the training and testing part of this project. TimelineOn April 12: Project Proposal By April 19: Implement a basic matrix Factorization Algorithm By April 26: Implement Incremental SVD with regularization By May 3: Start analysis of data and collect results By May 8: Project Milestone; Write report/presentation and show results By May 17: Catch-up time; writing project; compare rmse with m3f and latent vector By May 24: Project should be almost complete, design poster and finish project write-up On May 28: Poster Presentation References[1] http://www.cs.toronto.edu/~hinton/csc2515/notes/pmf_tutorial.pdf [2] The BigChaos Solution to the Netflix Grand Prize, Andreas Toscher, Michael Jahrer, Robert M. Bell, 2009 [3] http://www.grouplens.org/node/73 [4] ] Clustering Items for Collaborative Filtering Mark O'Connor & Jon Herlocker, ACM SIGIR Workshop, 1999 [5] Restricted Boltzmann Machines for Collaborative Filtering, Ruslan Salakhutdinov, Andriv Mnih, Geoffrey Hinton, ICML, 2007 [6] Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems, Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl, GroupLens Research Group |