N. Ramakrishnan and C. Bailey-Kellogg, "Using physical properties for mining in data-scarce domains", IEEE Computing in Science & Engineering, 2002, 4:31-43. [preprint]

Data mining has traditionally been motivated by the desire to make inferences from vast repositories of data. However, many scientific domains are characterized by a scarcity of data, rather than abundance. This is especially true when data is being generated from costly computer simulations. In such data-scarce domains, it is advantageous to use a data collection and sampling strategy that focuses on only the most interesting regions, from a data mining perspective. In this paper, we describe how to design sampling strategies for data-scarce domains by exploiting knowledge of physical properties. The physical properties we concentrate on include continuity, correspondence, and locality and are especially relevant in spatial data interpretation applications. The methodology resulting from this approach is demonstrated in two diverse applications — mining pockets in spatial data, and qualitative determination of Jordan forms of matrices.