My email: hao.luo@dartmouth.edu
Motivation: In molecular and medical biology, clustering algorithm is quite useful in detecting similar functions and potential relations between different gene expressions. The traditional clustering algorithms like k-means have already been widely applied to analyze gene expression data. However, due to some limitations of experiments, many gene expression data often suffers from problems like missing data and inaccurate records. In the last few years, a more robust clustering algorithm, spectral clustering, has become quite popular and several its variations have been well developed. Compared to k-means algorithm, spectral clustering is more robust to noise and missing data and more useful in detecting unusual patterns. We can reasonably expect better performance than traditional k-means in application by using spectral clustering.
In this project, I am implementing the newly developed
spectral clustering methods on analyzing DNA microarray data. At the bottom
line, I will implement a variation of a spectral clustering algorithm developed
by
Before milestone, I expect to implement NJW clustering and
k-means algorithms in Python language. The work after milestone, I expect, is
to compare the results of these algorithms and to add new features to make it
more efficient.
Timeline:
· Read papers about NJW spectral clustering algorithm.
· Before milestone, Implement NJW algorithm and k-means algorithm in Python language.
· Before Final Debugging the algorithm and test data.
· Analyze results. Write the final report.
Reference
· A.Y. Ng, M. I. Jordan, and Y. Weiss.
On spectral clustering: Analysis and an algorithm. In T. G. Dietterich,
S. Becker, and Z. Ghahramani, editors, Advances in
Neural Information Processing Systems 14. pages
849-856,
· http://genome-www.stanford.edu/cellcycle/data/rawdata/