Dartmouth College Computer Science
Technical Report series
TR search TR listserv
|By author:||A B C D E F G H I J K L M N O P Q R S T U V W X Y Z|
|By number:||2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986|
This thesis presents a unified method for simultaneous solution of three problems in Information Retrieval--- metasearch (the fusion of ranked lists returned by retrieval systems to elicit improved performance), efficient system evaluation (the accurate evaluation of retrieval systems with small numbers of relevance judgements), and pooling or ``active sample selection" (the selection of documents for manual judgement in order to develop sample pools of high precision or pools suitable for assessing system quality). The thesis establishes a unified theoretical framework for addressing these three problems and naturally generalizes their solution to the on-line context by incorporating feedback in the form of relevance judgements.
The algorithm--- Rankhedge for on-line retrieval, metasearch and system evaluation--- is the first to address these three problems simultaneously and also to generalize their solution to the on-line context. Optimality of the Rankhedge algorithm is developed via Bayesian and maximum entropy interpretations. Results of the algorithm prove to be significantly superior to previous methods when tested over a range of TREC (Text REtrieval Conference) data. In the absence of feedback, the technique equals or exceeds the performance of benchmark metasearch algorithms such as CombMNZ and Condorcet. The technique then dramatically improves on this performance during the on-line metasearch process. In addition, the technique generates pools of documents which include more relevant documents and produce more accurate system evaluations than previous techniques. The thesis includes an information-theoretic examination of the original Hedge algorithm as well as its adaptation to the context of ranked lists. The work also addresses the concept of information-theoretic similarity within the Rankhedge context and presents a method for decorrelating the predictor set to improve worst case performance. Finally, an information-theoretically optimal method for probabilistic ``active sampling" is presented with possible application to a broad range of practical and theoretical contexts.
Doctoral dissertation. Advisors: Jay Aslam (NEU), Dan Rockmore, Chris Bailey-Kellogg, and John Lafferty (CMU).
Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]
Or copy and paste:
Robert A. Savell, "On-line Metasearch, Pooling, and System Evaluation." Dartmouth Computer Science Technical Report TR2005-543, June 2005.
Notify me about new tech reports.
Search the technical reports.
To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu
Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Technical reports collection maintained by David Kotz.