BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR97-324 ENTRY:: January 21, 1998 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: Computing Dense Clusters On-line for Information Organization TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Aslam, J. AUTHOR:: Pelekhov, K. AUTHOR:: Rus, Daniela DATE:: October 1997 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: Compressed Postscript at http://www.cs.dartmouth.edu/reports/TR97-324.ps.Z RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR97-324.pdf ABSTRACT:: We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms partition a document collection into a number of clusters that is naturally induced by the collection. We show a lower bound on the accuracy of the clusters produced by these algorithms. We use the random graph model to show that both star algorithms produce correct clusters in time Theta(V + E). Finally, we provide data from extensive experiments. NOTE:: Submitted to the 1998 SIGIR Conference. END:: ncstrl.dartmouthcs//TR97-324