C. Bailey-Kellogg and N. Ramakrishnan, "Active data mining of correspondence for qualitative assessment of scientific computations", Proc. QR, 2003, 23-30. [preprint]

Active data mining constructs and evaluates possible models explaining a dataset, and reasons about the cost and impact of additional samples on refining and selecting among the models. It is particularly appropriate for applications characterized by expensive data collection, from either experiment or simulation. This paper develops an active mining mechanism based on a multi-level, qualitative analysis of correspondence. Correspondence operators presented here leverage domain knowledge to establish relationships among objects, evaluate implications for model selection, and leverage identified weaknesses to focus additional data collection. The utility of the qualitative framework is demonstrated in two scientific computing applications — matrix spectral portrait analysis and graphical assessment of Jordan forms of matrices. Results show that the mechanism efficiently samples computational experiments and successfully uncovers high-level properties of data. The framework helps overcome noise and sparsity by leveraging domain knowledge to detect mutually reinforcing interpretations of spatial data.