|
Dartmouth College Computer Science Technical Report series |
CS home TR home TR search TR listserv |
| By author: | A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | |
| By number: | 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986 | |
Abstract:
Evaluating retrieval systems, such as those submitted to the
annual TREC competition, usually requires a large number of
documents to be read and judged for relevance to query
topics. Test collections are far too big to be exhaustively
judged, so only a subset of documents is selected to form
the judgment ``pool.'' The selection method that TREC uses
produces pools that are still quite large. Research has
indicated that it is possible to rank the retrieval systems
correctly using substantially smaller pools.
This paper introduces an active learning algorithm whose goal is to reach the correct rankings using the smallest possible number of relevance judgments. It adds one document to the pool at a time, always trying to select the document with the highest information gain. Several variants of this algorithm are described, each with improvements on the one before. Results from experiments are included for comparison with the traditional TREC pooling method. The best version of the algorithm reliably outperforms the traditional method, although its degree of improvement varies.
Note:
Senior Honors Thesis. Advisor: Jay Aslam.
Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]
Or copy and paste:
Lisa A. Torrey,
"An Active Learning Approach to Efficiently Ranking Retrieval Engines."
Dartmouth Computer Science Technical Report TR2003-449,
May 2003.
Notify me about new tech reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu
Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Technical reports collection maintained by David Kotz.