Collaborative Filtering Algorithm Applied to MovieLens DataIntroductionThe goal up to the milestone deadline was to implement incremental singular value decomposition. This goal was met and variants of it were implemented. Another different method used in Netflix prize called Restricted Boltzmann Machine was attempted. This was less successful and it's unfortunately very difficult to make sense of the different parameters that need to be optimized. MethodsTwo main methods were implemented in Matlab both of which are based on the latent factor models:
TestingThe movielens data set was used. It consists of 80000 ratings in the training sets and 20000 ratings in the testing set. The matrix is a 943 (users) by 1682 (movies). The root mean square error (RMSE) was computed between the observed ratings in the testing set. ResultsThe root mean square errors of the test ratings were computed for each method and Table 1 summarizes the results. The RMSE are worst than expected because no optimization was performed.
Figure 1 shows the decrease in RMSE as the number of iterations is increased (for the case of incremental SVD). ![]() ProblemsExcept for the running time to compute the rmse, incremental singular value decomposition did not cause much trouble. RBM, on the other hand, did cause some difficulties since, knowledge on the actual network was not enough, Gibbs sampling and Contrastive Divergence were also implemented but in the end the number of parameters (hidden units, biases for the visible and hidden units, momentum, weights, regularizations for each and more ) that have to be controlled and the time it takes to do each iteration caused very poor optimization resulting in very bad rmse. ConclusionsThe main goal of this project has been achieved whereby the results of incremental SVD have indeed showed that it's a good model for predicting the ratings of users. Implementation of RBM, on the other hand, which wasn't the main focus of this project was not very successful. References[1] Restricted Boltzmann Machines for Collaborative Filtering, Ruslan Salakhutdinov, Andriy Mnih, Geoffrey Hinton, International Conference on Machine Learning, Corvallis, 2007 [2] The BigChaos Solution to the Netflix Grand Prize, Andreas Toscher, Michael Jahrer, Robert M. Bell, 2009 [3] http://sifter.org/~simon/journal/20061211.html [4] http://www.cs.toronto.edu/~hinton/csc2515/notes/pmf_tutorial.pdf [5] http://www.grouplens.org/node/73 [6] Clustering Items for Collaborative Filtering Mark O'Connor & Jon Herlocker, ACM SIGIR Workshop, 1999 [7] Improving regularized Singular Value Decomposition for collabora- tive filtering. Proceedings of KDD Cup and Workshop, A. Paterek |