Abstract: High-performance computing increasingly occurs on ``computational grids'' composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single ``virtual'' computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. Our parallel I/O framework, called Armada, allows application and data-set providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before computation. Although the framework provides a simple programming model for the application programmer and the data-set provider, the resulting graph may contain bottlenecks that prevent efficient data access. In this paper, we present an algorithm used to restructure Armada graphs that distributes computation and data flow to improve performance in the context of a wide-area computational grid.
Keywords: parallel computing, parallel-IO, file system, distributed computing
Copyright © 2006 by Springer-Verlag.The copy made available here is the authors' version; for a definitive copy see the publisher's version described above.