Online social networking services have become tremendously popular in recent years, with popular social networking sites like Facebook, Twitter adding thousands of enthusiastic new users every day to their existing billion of actively engaged users. As major platforms for building friendship and sharing interest online, they can also flood users with huge volumes of information and hence put them at a risk of information overload. Therefore, reducing the risk of information overload is a priority for improving the user experience and it also presents opportunities for novel data mining solution. Thus, capturing users' interests and serving them with potentially interesting items (e.g. advertisements, products), is a fundamental and crucial feature social networking websites.[1]
In this project, we want to solve two main problems:
We can approach Problem 1 as a classication problem. The suitable set of techniques
includes: Gaussian processes[2]; ensemble classifiers[3]; k-nearest neighbor(knn) classification[4]; neural networks[5].
For Problem 2, the suitable methods are: knn regression[6] + metric learning[7]; support vector regression[8]. Since we want to constrain explicitly the output variable to be a number between 0 and 1, we will apply a sigmoid function to the output predicted by the method.
The dataset is provided by Tencent Inc. which has a product called Tencent Weibo. Tencent Weibo is one of the largest micro-blogging websites in China. Currently, there are more than 200 million registered users on Tencent Weibo, generating over 40 million messages each day.
The dataset for problem 1 includes: recommendation log; user profile; item; user action; users' key words
The dataset for problem 2 includes: query; purchased key word; title; description; user
We expect to nish the solution of Problem 1 and the framework of Problem 2 by the milestone due date.