CS134 Project Proposal

Spectral clustering algorithm application on analyzing gene expression

Motivation: In molecular and medical biology, clustering algorithm is quite useful in detecting similar functions and potential relations between different gene expressions. The traditional clustering algorithms like k-means have already been widely applied to analyze gene expression data. However, due to some limitations of experiments, many gene expression data often suffers from problems like missing data and inaccurate records. In the last few years, a more robust clustering algorithm, spectral clustering, has become quite popular and several its variations have been well developed. Compared to k-means algorithm, spectral clustering is more robust to noise and missing data and more useful in detecting unusual patterns. We can reasonably expect better performance than traditional k-means in application by using spectral clustering.

In this project, I am implementing the newly developed spectral clustering methods on analyzing DNA microarray data. At the bottom line, I will implement a variation of a spectral clustering algorithm developed by Ng, Jordan and Weiss (NJW algorithm)[1] and traditional k-means algorithm. A comparison of those methods will be presented in the final report. NJW spectral algorithm uses tools from matrix perturbation theory. Therefore this method has solid theoretical support and some of its variations are expected to perform quite well on real datasets. The dataset I plan to use is the yeast cell cycle data [2] from Stanford’s yeast cell cycle analysis project.

Before milestone, I expect to implement NJW clustering and k-means algorithms in Python language. The work after milestone, I expect, is to compare the results of these algorithms and to add new features to make it more efficient.

Timeline:

· Read papers about NJW spectral clustering algorithm.

· Before milestone, Implement NJW algorithm and k-means algorithm in Python language.

· Before Final Debugging the algorithm and test data.

· Analyze results. Write the final report.

Reference

· A.Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14. pages 849-856, Cambridge, MA, 2002. MIT Press

· http://genome-www.stanford.edu/cellcycle/data/rawdata/