Dartmouth logo Dartmouth College Computer Science
Technical Report series
CS home
TR home
TR search TR listserv
By author: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
By number: 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986

An Information Retrieval System for Performing Hierarchical Document Clustering
Eric Hagen
Dartmouth PCS-TR97-318


This thesis presents a system for web-based information retrieval that supports precise and informative post-query organization (automated document clustering by topic) to decrease real search time on the part of the user. Most existing Information Retrieval systems depend on the user to perform intelligent, specific queries with Boolean operators in order to minimize the set of returned documents. The user essentially must guess the appropriate keywords before performing the query. Other systems use a vector space model which is more suitable to performing the document similarity operations which permit hierarchical clustering of returned documents by topic. This allows "post query" refinement by the user. The system we propose is a hybrid beween these two systems, compatibile with the former, while providing the enhanced document organization permissable by the latter.

Note: Senior Honors Thesis. Advisor: Javed Aslam.

PS.Z compressed postscript .ps.Z (1059KB) , PDF PDF (699KB) (derived from the ps.Z)

Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]

Or copy and paste:
   Eric Hagen, "An Information Retrieval System for Performing Hierarchical Document Clustering ." Dartmouth Computer Science Technical Report PCS-TR97-318, May 1997.

Notify me about new tech reports.

Search the technical reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu

Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Technical reports collection maintained by David Kotz.