BIB-VERSION:: CS-TR-v2.0 ID:: ncstrl.dartmouthcs//TR2003-459 ENTRY:: June 05, 2003 ORGANIZATION:: Dartmouth College, Computer Science TITLE:: Efficient I/O for Computational Grid Applications TYPE:: Technical Report (paper) REVISION:: 1 AUTHOR:: Oldfield, Ron A. DATE:: May 2003 RETRIEVAL:: For a paper copy, email RETRIEVAL:: For a paper copy, write to Technical Report Librarian Department of Computer Science Dartmouth College 6211 Sudikoff Laboratory Hanover, NH 03755-3510 USA RETRIEVAL:: Compressed Postscript at http://www.cs.dartmouth.edu/reports/TR2003-459.ps.Z RETRIEVAL:: PDF at http://www.cs.dartmouth.edu/reports/TR2003-459.pdf ABSTRACT:: High-performance computing increasingly occurs on "computational grids" composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single "virtual" computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. This dissertation explores some of the issues associated with I/O for wide-area distributed computing and describes an I/O system, called Armada, with the following features: a framework to allow application and dataset providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before or after computation; an algorithm to restructure application graphs to increase parallelism and to improve network performance in a wide-area network; and a hierarchical graph-partitioning scheme that deploys components of the application graph in a way that is both beneficial to the application and sensitive to the administrative policies of the different administrative domains. Experiments show that applications using Armada perform well in both low- and high-bandwidth environments, and that our approach does an exceptional job of hiding the network latency inherent in grid computing. NOTE:: This is a reformatted version of Ron Oldfield's Ph.D. dissertation. Unlike the dissertation submitted to Dartmouth College, this version is single-spaced, uses 11pt fonts, and is formatted specifically for double-sided printing. END:: ncstrl.dartmouthcs//TR2003-459