MovieSpot - An Integrated Movie Recommender System Based on Visual Context And Social Rating
Project of CS174, Fall 2012

Team Members

Ruzi Zhang

Shiyuan Jiang

Yibo Long

Introduction

Recommending movies can be based on various biases. Most recommender systems(RS) are based on users’ social connections or ratings. Additionally, some use keyword context and time to filter related movies. Those solutions usually prefer simple features to learn the result.

But the features and content of a movie are too complex for a simple relationship to describe. Apart from the title and keywords of a movie, how a director works on the film’s artistic and dramatic aspects, and how actors attract audience’s eyes, etc. are more important for people to evaluate a movie.

One interesting way to present movies is to use moviebarcode, which displays simple but elegant feature of a movie. Furthermore, it not only presents a movie’s content, but also a movie’s quality and taste. Supported by tumblr, the moviebarcode compresses a movie into colorful and elongated barcodes.

Above is an example of the moviebarcode for The Matrix[13], through which we may feel the color mood gloomy and mysterious. Therefore, we suppose it may be a genius idea to involves visual patterns (like barcode) in movie RS.

So our goal is to produce a movie RS integrated with two components, social rating and movie context with visual patterns. However, generating moviebarcode is time-consuming and involves copyright issues. Also, it is almost impossible to get digital copies of old movies. Thus we choose to collect posters on movielens’ data set instead.

It is advantageous to have both these two components on the RS. On one hand, by comparing the results of the two components, we are able to tell the difference between a movie’s properties and its social influence. On the other hand, such a system can make prediction even when one movie has not enough social ratings and comments.

Problems

A complete RS should include the prediction from both context and social ratings of a movie. So we will build our system on both track, and evaluate whether it helps to improve the recommendation. But most importantly it provide us a way to criticize a movie that has never happened before.

Through context-based RS we try to evaluate a whole movie in every aspect. If it can even go through a whole movie to make a criticism, it will become an expert. That’s why we hold a high expectation of visual element like barcode and poster.

Therefore, the comparison between context-based and social RS is not only for building a hybrid RS, but it is also for finding and learning what entertains audience most. A strong recommendation from context-based differs from social part may suggest that some elements are less important for recommendation and may also suggest that the director should shift his attention to other points that attract audience.

Context-based Recommender System

A context-based recommender system takes advantage of social ones as it merely needs context to make a prediction. When problems comes to rare movies and new movies, insufficient ratings and comments fails to make a difference. However, lacking sociality avoids the problems a social RS has, such as, data sparsity, scalability, cold start[12].

The context of a movie will be presented by a vector space. Following features are considered in our system.

    1. Poster

    After researching, we found a sparse representation based on bags of visual words[6][10] is highly suitable. An image can be represented as an array of integers using this methods. Each image is divided into square blocks, and each block is represented by color histogram. By counting the block in a predefined set(dictionary) of d = 10,000 vectors of such features, the image is thus represented as the frequency of limited blocks.

    Though it is not like the barcode feature we mentioned above, the poster still captures one’s eyes by emphasizing sale points and the mood in the scene. Therefore, histogram can be helpful to present those in blocks. These features were systematically tested and found to outperform other features in related tasks[2].

    2. Content.

    There are many contents that we think help contribute the recommendation[7], like title, genre, director, actors, plot keywords, etc.

The recommendation result will then be computed based on k-nearest-neighbor, the similarity between movies can be calculated by Euclidean distance.

Social Recommender System

Social recommender system is usually based on users’ rating and connections. The social network or similar users who have the similar history provide the evidence of recommendations.

Collaborative filtering (CF) is a technique aimed at solving this problem[3]. There are two models are based on CF. User-based collaborative filtering predicts a test user’s interest in a test item based on rating information from similar user profiles. Item-based collaborative filtering otherwise uses similarity from items as the basis.

We will test two state-of-the-art algorithms to find which is better for our problem:

     1. K Nearest Neighbor(KNN).

Either based on item or user, KNN chooses the most similar k items as the prediction.

     2. Matrix Factorization.

The hot topic in matrix factorization is using singular value decomposition(SVD) to expose the similarity of items.[5]

Dataset

Evaluation will be based on movielens data set[11]. We will firstly fetch the related context that the data set doesn’t include from IMDb. Then we will test separate algorithm on context related recommendation and social recommendation. Afterwards we will compare what is the difference between the results of two algorithms, and figure out a scheme to wrap them up to complete a real recommender system.

Timetable

DateTopics
by Jan 24Completed building development environment, collecting and cleaning datasets.
Finished writing proposal and gave spotlight presentation.
by Feb 19Finish developing of both context-based and social-based systems, and conduct comparative experiments.
Come to an conclusion of how to integrate these two approach into a whole system.
by Mar 7Finish system integration, UI development, and final report.

Reference

[1] Woerndl, Wolfgang, Christian Schueller, and Rolf Wojtech. "A hybrid recommender system for context-aware recommendations of mobile applications." Data Engineering Workshop, 2007 IEEE 23rd International Conference on 17 Apr. 2007: 871-878.

[2] Chechik, Gal et al. "An online algorithm for large scale image similarity learning." Proc. NIPS 2009.

[3] Schafer, J et al. "Collaborative filtering recommender systems." The adaptive web (2007): 291-324.

[4] Li, Wei et al. "Design and evaluation of a command recommendation system for software applications." ACM Transactions on Computer-Human Interaction (TOCHI) 18.2 (2011): 6.

[5] Gantner, Zeno, Steffen Rendle, and Lars Schmidt-Thieme. "Factorization models for context-/time-aware movie recommendations." Proceedings of the Workshop on Context-Aware Movie Recommendation 30 Sep. 2010: 14-19.

[6] Wang, Gang, Derek Hoiem, and David Forsyth. "Learning image similarity from flickr groups using fast kernel machines." 99 (2012): 1-1.

[7] Shi, Yue, Martha Larson, and Alan Hanjalic. "Mining mood-specific movie similarity with matrix factorization for context-aware recommendation." Proceedings of the Workshop on Context-Aware Movie Recommendation 30 Sep. 2010: 34-40.

[8] Konstan, Joseph A et al. "Recommender systems: A grouplens perspective." Proc. Recommender Systems, Papers from 1998 Workshop, Technical Report WS-98-08 1998.

[9] Konstan, Joseph A, and John Riedl. "Recommender systems: from algorithms to user experience." User Modeling and User-Adapted Interaction (2012): 1-23.

[10] Grangier, David, and Samy Bengio. "A discriminative kernel-based approach to rank images from text queries." Pattern Analysis and Machine Intelligence, IEEE Transactions on 30.8 (2008): 1371-1384.

[11] http://www.grouplens.org/node/73

[12] http://en.wikipedia.org/wiki/Recommender_system

[13] http://moviebarcode.tumblr.com/