Many search engines exist, both on the World Wide Web and for specific databases of information such as US patents and medical abstracts, to name but a few examples. No one search algorithm or technique works best over all types of queries and underlying databases. The problem of metasearch is to intelligently combine the results of multiple search engines in response to a given query, obtaining a overall response whose quality equals, or exceeds, that of the (unknown) best search engine. Many well-known and commonly used metasearch algorithms are based on ad hoc techniques and tend to be rather simplistic; for example, simply interleaving the ranked lists returned by the underlying search engines or ordering documents by their rank-sum over the underlying systems.
To combat this problem, we have developed rigorous, mathematical
models for investigating the problem of metasearch. In one model,
based on decision theory, the average performance of the
underlying search engines can be assessed to obtain, for example, the
expected probability that a document ranked at level
by search
engine will be relevant or irrelevant with respect to a given
query. A collection of retrieved documents may then be ordered by
odds of relevance, calculated in a Bayes-optimal manner by
combining the probabilistic evidence of relevance obtained for each
document from the underlying search
engines [AM01,AM00].
We have implemented algorithms based on this decision-theoretic model and tested them against benchmark data from the TREC2 competition. Even assuming search engine independence (a condition known not to hold in practice), the performance of his method equals or exceed the performance of the previously best, highly tuned, ad hoc techniques. Furthermore, unlike ad hoc techniques, this mathematical model provides a clear foundation for progress and further research.
We have also studied and developed a model for metasearch based on social choice theory. The problem of metasearch can be viewed as a multi-candidate election: documents are ``candidates'' and search engines are ``voters'' expressing preferential rankings among the candidates. As such, algorithms from election theory can be brought to bear on the metasearch problem, leveraging on the large body of work developed in the field of social choice. Many interesting algorithmic issues arise in the analysis and implementation of metasearch algorithms based on social choice theory; for example, typical multi-candidate elections have a small number of candidates and a large number of voters while the analogous metasearch problem is quite the opposite. We have developed, analyzed and implemented a number of metasearch algorithms based on multi-candidate election strategies and tested them on benchmark data sets from the TREC competition and the World Wide Web [MA02,AM01]. Our results demonstrate that these algorithms are efficient and outperform widely used ad hoc techniques.
Finally, we have also demonstrated that metasearch improves the consistency of search systems across queries [MA01a], and we have developed a number of techniques for improving the quality of nearly all standard metasearch algorithms [MA01b], including the well-known and widely used ad hoc techniques.