Clustering dynamic networks with probabilistic tensor factorizations

Nick Foti

Introduction

Relational data has become abundant, examples of which include social networks, users’ ratings of items, gene interactions and correlations between financial instruments. This data is often represented as a network where the nodes represent the objects, e.g. users, items, genes, financial instruments, etc. and edges connect nodes that exhibit the relation. The edges may also have an associated weight indicating the strength of the relation. Most current work analyzing such data treats the networks as static objects. However, in most cases both the edges and weights change over time. For example friend relationships on Facebook are added and deleted, users may change their ratings of items over time, genes may interact with different strengths at different stages of the cell cycle and financial instruments exhibit dynamic correlations over time.

An important problem when analyzing relational data between nodes of a single type (e.g. a social network) is community identification (or clustering), finding sets of nodes that are “more similar” to each other than other nodes in the network. A related problem is co-clustering for networks with two types of nodes (e.g. users rating items), which aims to simultaneously cluster both types of nodes. There has been a lot of work on both clustering and co-clustering for static networks, however algorithms for dynamic networks have received little attention.

In this work we develop probabilistic methods based on tensor factorization to cluster dynamic networks. The proposed methods solve both the clustering and co-clustering problems in their respective contexts. We develop inference algorithms and evaluate the performance of the methods on synthetic and real data sets.

Methods

In the case of static networks there are many algorithms to cluster the nodes. The most well-known graph clustering algorithm is spectral clustering [1]. There are a few variants of spectral clustering, but they all minimize the sum of the edge weights between clusters normalized by the size of the cluster, i.e. a normalized cut. Similarly, most co-clustering algorithms augment spectral clustering methods to minimize normalized cuts over both types of nodes [2].

A disadvantage of spectral clustering based algorithms is that they are hard-clustering algorithms, each node is assigned to exactly one cluster. Rather than assigning each node to a single cluster one can ask for the posterior probability that a node belongs to a cluster which measures the uncertainty of the learned clustering. Probabilistic non-negative matrix factorization has been used to construct a soft-clustering algorithm for networks [3] and could also be used to perform soft co-clustering.

We propose to represent dynamic networks as a third-order tensor [4] and use probabilistic formulations of tensor factorizations to perform clustering and co-clustering over time. Tensor factorization has proven effective at link prediction [5] and collaborative filtering [6]. However, learning soft clusterings from tensor factorizations has yet to be considered. Given a dynamic network on N nodes where we observe snapshots of the adacency matrix at T specific times {s_1, s_2, ... , s_T} we create a third-order tensor X where the (i,j,l) entry, X[i,j,l], represents the weight of the edge from node i to j at time s_l. In Matlab notation X[:,:,l] is the adjacency matrix we observed at time s_l. We assume that each entry of X is generated from density p() as

  X[i,j,l] ~ p(Y[i,j,l], theta)

with mean parameter

  Y[i,j,l] = sum_{k=1}^K (u_i[k] * v_j[k] * t_l[k])

for latent vectors {u_i}, {v_j} and {t_l} where i, j, k = 1,...,K, and theta represents any other parameters of the chosen density. When X[:,:,l] represents an adjacency matrix we obtain the INDSCAL decomposition instead

  Y[i,j,l] = sum_{k=1}^K (u_i[k] * u_j[k] * t_l[k])

where the vectors v are the same as the u vectors.

The latent vectors {u_i} when normalized to have unit L1 norm can be interpreted as the probability distribution over the K latent clusters having generated node i's edges. The same holds for the vectors {v_j} in the CP factorization. If we consider the k'th entry of each t vector then we obtain the time profile of cluster k over time.

We will consider both Gaussian and Poisson densities as likelhood functions for the entries of the tensors. The former yields an easy learning problem and the latter seems like a good model for network data. Based on our choice of density we will place prior distributions on the latent u, v and t vectors and learn MAP estimates where the chosen priors will implicitly perform model selection of the number of clusters (ARD priors for instance) [7]. The resulting learning problem is to find the matrices U,V and T that maximize

  p(U,V,T | X, theta) =

  (p(X | U,V,T, theta) * p(U,V,T) * p(theta)) / p(X)

where U, V and T are matrices with the vectors u_i, v_j and t_l in the columns. Since time varying network data sets can become very large and usually have a sparse structure we propose using stochastic gradient descent to provide efficient learning of the latent vectors. In the case of dense networks a batch gradient descent algorithm will be used. Lastly, with the Poisson likelhood we may need to resort to a variational method for inference.

Given that there is very little work on clustering dynamic networks there are accordingly few measures of the quality of the learned clusterings. The current work does not aim to create such measures. We will rely on carefully constructed synthetic data to validate our methods. In the case of real data we must must appeal to qualitative inspection of the learned clusters. We will however hold out a test set of links for the real data sets we study and report how well we can predict edges with the model with respect to an appropriate norm.

Data

We will apply the methods to real data sets from three applications. First we will consider the World Trade Web, a network of the amount of US dollars each country exports to every other country each year from 1948-2000. Next we consider the network of correlations between stocks from the New York Stock Exchange for windows of time from 1990-2007. The window size will be chosen so as to provide a reasonable number of observations without creating a data set that is too large. Lastly, we consider word count data from NIPS for the years 1988-2003 [8]. We construct bipartite networks between authors and words and words and documents for each year of data, as well as a co-citation between authors. We will also consider networks between authors and conferences derived from the DBLP database.

Timeline

All dates below are completion dates

4/15 - Have all data sets cleaned and ready for analysis
4/22 - Implement Gaussian likelihood model (both batch and SGD)
4/29 - Implement Poisson likelihood model (SGD and VB if necessary)
5/06 - Buffer period, start milestone material
5/10 - Milestone presentation, start polishing final results
5/20 - ALL ANALYSIS COMPLETE
5/26 - Print poster

Note: Dates above may be subject to change and certain goals may be shifted forward or backward in time depending on progress.

References

A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 13, 2001.
I.S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
I. Psorakis, S. Roberts, and B. Sheldon. Soft Partitioning in Networks via Bayesian Non-negative Matrix Factorization. In Workshop "Networks Across Disciplines in Theory and Applications", Neural Information Processing Systems, 2010.
T.G. Kolda, and B.W. Bader. Tensor Decompositions and Applications. In SIAM Review, 51(3):455-500, September 2009.
D.M. Dunlavy, T.G. Kolda, and E. Acar. Temporal Link Prediction using Matrix and Tensor Factorizations. In ACM Transactions on Knowledge Discovery from Data, 5(2):Article 10, February 2011.
L. Xiong, X. Chen, T. Huang, J. Schneider, and J.G. Carbonell. Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization. In Proceedings of SIAM Data Mining, 2010.
V.Y.F. Tan, and C. Fevotte. Automatic Relevance Determination in Nonnegative Matrix Factorization. SPARS 2009.
A. Globerson, G. Chechik, F. Pereira, and Naftali Tishby. Euclidean Embedding of Co-occurrence Data. In JMLR 8, 2007.