Recommendation Prediction in Social Networks

Yangpai Liu and Yuting Cheng
April 10, 2012

1. Background

Online social networking services have become tremendously popular in recent years, with popular social networking sites like Facebook, Twitter adding thousands of enthusiastic new users every day to their existing billion of actively engaged users. As major platforms for building friendship and sharing interest online, they can also flood users with huge volumes of information and hence put them at a risk of information overload. Therefore, reducing the risk of information overload is a priority for improving the user experience and it also presents opportunities for novel data mining solution. Thus, capturing users' interests and serving them with potentially interesting items (e.g. advertisements, products), is a fundamental and crucial feature social networking websites.[1]

2. Problem

In this project, we want to solve two main problems:

Predict whether or not a user will follow an item that has been recommended to the user. Here, items can be groups, organizations, or person and so on.
Predict the click-through rate of ads. Click-through rate is a way of measuring the success of an online advertising for a particular website. This is very important for ranking and pricing the ads.

3. Method

We can approach Problem 1 as a classication problem. The suitable set of techniques includes: Gaussian processes[2]; ensemble classifiers[3]; k-nearest neighbor(knn) classification[4]; neural networks[5].

For Problem 2, the suitable methods are: knn regression[6] + metric learning[7]; support vector regression[8]. Since we want to constrain explicitly the output variable to be a number between 0 and 1, we will apply a sigmoid function to the output predicted by the method.

4. Dataset

The dataset is provided by Tencent Inc. which has a product called Tencent Weibo. Tencent Weibo is one of the largest micro-blogging websites in China. Currently, there are more than 200 million registered users on Tencent Weibo, generating over 40 million messages each day.

The dataset for problem 1 includes: recommendation log; user profile; item; user action; users' key words

The dataset for problem 2 includes: query; purchased key word; title; description; user