Dartmouth logo Dartmouth College Computer Science
Technical Report series
CS home
TR home
TR search TR listserv
By author: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
By number: 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986

Metasearch: Data Fusion for Document Retrieval
Mark H. Montague
Dartmouth TR2002-424

Abstract:

The metasearch problem is to optimally merge the ranked lists output by an arbitrary number of search systems into one ranked list. In this work:

(1) We show that metasearch improves upon not just the raw performance of the input search engines, but also upon the consistency of the input search engines from query to query.

(2) We experimentally prove that simply weighting input systems by their average performance can dramatically improve fusion results.

(3) We show that score normalization is an important component of a metasearch engine, and that dependence upon statistical outliers appears to be the problem with the standard technique.

(4) We propose a Bayesian model for metasearch that outperforms the best input system on average and has performance competetive with standard techniques.

(5) We introduce the use of Social Choice Theory to the metasearch problem, modeling metasearch as a democratic election. We adapt a positional voting algorithm, the Borda Count, to create a metasearch algorithm, acheiving reasonable performance.

(6) We propose a metasearch model adapted from a majoritarian voting procedure, the Condorcet algorithm. The resulting algorithm is the best performing algorithm in a number of situations.

(7) We propose three upper bounds for the problem, each bounding a different class of algorithms.

We present experimental results for each algorithm using two types of experiments on each of four data sets.

Note: Ph.D dissertation.


PS.Z compressed postscript .ps.Z (508KB) , PDF PDF (860KB) (derived from the ps.Z)

Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]

Or copy and paste:
   Mark H. Montague, "Metasearch: Data Fusion for Document Retrieval." Dartmouth Computer Science Technical Report TR2002-424, May 2002.


Notify me about new tech reports.

Search the technical reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu


Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Technical reports collection maintained by David Kotz.