Papers with keyword 'pario'

That is, papers related to Parallel I/O

[Also available in BibTeX] [See also: all keywords]

These papers relate to parallel I/O, that is, I/O for parallel computers.

Papers are listed in reverse-chronological order; click an entry to pop up the abstract. For full information and pdf, please click Details link. Follow updates with RSS.

2006:
Ron Oldfield and David Kotz. Improving data access for computational grid applications. Cluster Computing. January 2006. [Details]

High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. Our parallel I/O framework, called Armada, allows application and data-set providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before computation. Although the framework provides a simple programming model for the application programmer and the data-set provider, the resulting graph may contain bottlenecks that prevent efficient data access. In this paper, we present an algorithm used to restructure Armada graphs that distributes computation and data flow to improve performance in the context of a wide-area computational grid.

2004:
Darren Erik Vengroff and David Kotz. A Holesome File System. Technical Report, May 2004. Originally written in July 1995; released May 2004. [Details]

We present a novel approach to fully dynamic management of physical disk blocks in Unix file systems. By adding a single system call, zero(), to an existing file system, we permit applications to create holes, that is, regions of files to which no physical disk blocks are allocated, far more flexibly than previously possible. zero can create holes in the middle of existing files.

Using zero(), it is possible to efficiently implement applications including a variety of databases and I/O-efficient computation systems on top of the Unix file system. zero() can also be used to implement an efficient file-system-based paging mechanism. In some I/O-efficient computations, the availability of zero() effectively doubles disk capacity by allowing blocks of temporary files to be reallocated to new files as they are read.

Experiments on a Linux ext2 file system augmented by zero() demonstrate that where their functionality overlaps, zero() is more efficient than ftruncate(). Additional experiments reveal that in exchange for added effective disk capacity, I/O-efficient code pays only a small performance penalty.


2003:
Ron Oldfield. Efficient I/O for Computational Grid Applications. PhD thesis, May 2003. Available as Dartmouth Computer Science Technical Report TR2003-459. [Details]

High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. This dissertation explores some of the issues associated with I/O for wide-area distributed computing and describes an I/O system, called Armada, with the following features: a framework to allow application and dataset providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before or after computation; an algorithm to restructure application graphs to increase parallelism and to improve network performance in a wide-area network; and a hierarchical graph-partitioning scheme that deploys components of the application graph in a way that is both beneficial to the application and sensitive to the administrative policies of the different administrative domains. Experiments show that applications using Armada perform well in both low- and high-bandwidth environments, and that our approach does an exceptional job of hiding the network latency inherent in grid computing.

2002:
Ron Oldfield and David Kotz. Using Emulab network testbed to evaluate the Armada I/O framework for computational grids. Technical Report, September 2002. [Details]

This short report describes our experiences using the Emulab network testbed at the University of Utah to test performance of the Armada framework for parallel I/O on computational grids.

Ron Oldfield and David Kotz. Armada: a parallel I/O framework for computational grids. Future Generation Computing Systems (FGCS). March 2002. [Details]

High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. One of the great challenges for this environment is to provide efficient access to data that is distributed across remote data servers in a grid. In this paper, we describe our solution, a framework we call Armada. Armada allows applications to flexibly compose modules to access their data, and to place those modules at appropriate hosts within the grid to reduce network traffic.

Ron Oldfield and David Kotz. The Armada framework for parallel I/O on computational grids. Proceedings of the USENIX Conference on File and Storage Technologies (FAST). January 2002. Work-in-progress report. [Details]
2001:
David Kotz. Disk-directed I/O for MIMD Multiprocessors. High Performance Mass Storage and Parallel I/O: Technologies and Applications. September 2001. [Details]

Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible both for simple reads and writes and for an out-of-core application. Indeed, our disk-directed I/O technique provided consistent high performance that was largely independent of data distribution, obtained up to 93% of peak disk bandwidth, and was as much as 18 times faster than the traditional technique.

David Kotz and Carla Schlatter Ellis. Practical Prefetching Techniques for Multiprocessor File Systems. High Performance Mass Storage and Parallel I/O: Technologies and Applications. September 2001. [Details]

Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potential to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies that base decisions only on on-line reference history, and that can be implemented efficiently. We also test the ability of these policies across a range of architectural parameters.

Ron Oldfield and David Kotz. Scientific Applications using Parallel I/O. High Performance Mass Storage and Parallel I/O: Technologies and Applications. September 2001. [Details]

Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about scientific applications using parallel I/O.

Ron Oldfield and David Kotz. Armada: A parallel file system for computational grids. Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (ccGrid). May 2001. [Details]

High-performance distributed computing appears to be shifting away from tightly-connected supercomputers to computational grids composed of heterogeneous systems of networks, computers, storage devices, and various other devices that collectively act as a single geographically distributed virtual computer. One of the great challenges for this environment is providing efficient parallel data access to remote distributed datasets. In this paper, we discuss some of the issues associated with parallel I/O and computatational grids and describe the design of a flexible parallel file system that allows the application to control the behavior and functionality of virtually all aspects of the file system.

2000:
David Kotz. Bibliography about Parallel I/O. BibTeX bibliography, 2000. Original version published 1994. [Details]

A bibliography of many references on parallel I/O and multiprocessor file-systems issues. As of the fifth edition, it is available in HTML format.

1999:
David Kotz and Ravi Jain. I/O in Parallel and Distributed Systems. Encyclopedia of Computer Science and Technology. 1999. Supplement 25. [Details]

We sketch the reasons for the I/O bottleneck in parallel and distributed systems, pointing out that it can be viewed as a special case of a general bottleneck that arises at all levels of the memory hierarchy. We argue that because of its severity, the I/O bottleneck deserves systematic attention at all levels of system design. We then present a survey of the issues raised by the I/O bottleneck in six key areas of parallel and distributed systems: applications, algorithms, languages and compilers, run-time libraries, operating systems, and architecture.

1998:
Ron Oldfield and David Kotz. Applications of Parallel I/O. Technical Report, August 1998. Supplement to PCS-TR96-297. [Details]

Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about scientific applications using parallel I/O. It will be updated periodically.

Matthew P. Carter and David Kotz. An Implementation of the Vesta Parallel File System API on the Galley Parallel File System. Technical Report, April 1998. [Details]

To demonstrate the flexibility of the Galley parallel file system and to analyze the efficiency and flexibility of the Vesta parallel file system interface, we implemented Vesta’s application-programming interface on top of Galley. We implemented the Vesta interface using Galley’s file-access methods, whose design arose from extensive testing and characterization of the I/O requirements of scientific applications for high-performance multiprocessors. We used a parallel CPU, parallel I/O, out-of-core matrix-multiplication application to test the Vesta interface in both its ability to specify data access patterns and in its run-time efficiency. In spite of its powerful ability to specify the distribution of regular, non-overlapping data access patterns across disks, we found that the Vesta interface has some significant limitations. We discuss these limitations in detail in the paper, along with the performance results.

1997:
Nils Nieuwejaar and David Kotz. The Galley Parallel File System. Parallel Computing. June 1997. [Details]

Most current multiprocessor file systems are designed to use multiple disks in parallel computing, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley’s file structure and application interface, as well as the performance advantages offered by that interface.

Sanjay Khanna and David Kotz. A Split-Phase Interface for Parallel File Systems. Technical Report, March 1997. [Details]

We describe the effects of a new user-level library for the Galley Parallel File System. This library allows some pre-existing sequential programs to make use of the Galley Parallel File System with minimal modification. It permits programs to efficiently use the parallel file system because the user-level library groups accesses together. We examine the performance of our library, and we show how code needs to be modified to use the library.

David Kotz. Disk-directed I/O for MIMD Multiprocessors. ACM Transactions on Computer Systems. February 1997. [Details]

Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible both for simple reads and writes and for an out-of-core application. Indeed, our disk-directed I/O technique provided consistent high performance that was largely independent of data distribution, obtained up to 93% of peak disk bandwidth, and was as much as 18 times faster than the traditional technique.

1996:
David Kotz. Introduction to Multiprocessor I/O Architecture. Input/Output in Parallel and Distributed Computer Systems. 1996. [Details]

The computational performance of multiprocessors continues to improve by leaps and bounds, fueled in part by rapid improvements in processor and interconnection technology. I/O performance thus becomes ever more critical, to avoid becoming the bottleneck of system performance. In this paper we provide an introduction to I/O architectural issues in multiprocessors, with a focus on disk subsystems. While we discuss examples from actual architectures and provide pointers to interesting research in the literature, we do not attempt to provide a comprehensive survey. We concentrate on a study of the architectural design issues, and the effects of different design alternatives.

Nils Nieuwejaar and David Kotz. Low-level Interfaces for High-level Parallel I/O. Input/Output in Parallel and Distributed Computer Systems. 1996. [Details]

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. By tracing all the activity of a parallel file system in a production, scientific computing environment, we show that many applications exhibit highly regular, but non-consecutive I/O access patterns. Since the conventional interface does not provide an efficient method of describing these patterns, we present three extensions to the interface that support strided, nested-strided, and nested-batched I/O requests. We show how these extensions can be used to express common access patterns.

Alok Choudhary and David Kotz. Large-Scale File Systems with the Flexibility of Databases. ACM Computing Surveys. December 1996. Position paper for the Working Group on Storage I/O for Large-Scale Computing, ACM Workshop on Strategic Directions in Computing Research. Available on-line only. [Details]

We note that large-scale computing includes many applications with intensive I/O demands. A data-storage system for such applications must address two issues: locating the appropriate data set, and accessing the contents of the data set. Today, there are two extreme models of data location and management: 1) file systems, which can be fast but which require a user to manage the structure of the file-name space and, often, of the file contents; and 2) object-oriented-database systems, in which even the smallest granule of data is stored as an object with associated access methods, which is very flexible but often slow. We propose a solution that may provide the performance of file systems with the flexibility of object databases.

Nils A. Nieuwejaar. Galley: A New Parallel File System for Parallel Applications. PhD thesis, November 1996. Available as Dartmouth Computer Science Technical Report PCS-TR96-300. [Details]

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scientific applications. Most multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access those multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated application and library programmers to use knowledge about their I/O to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support.

In this work we examine current multiprocessor file systems, as well as how those file systems are used by scientific applications. Contrary to the expectations of the designers of current parallel file systems, the workloads on those systems are dominated by requests to read and write small pieces of data. Furthermore, rather than being accessed sequentially and contiguously, as in uniprocessor and supercomputer workloads, files in multiprocessor file systems are accessed in regular, structured, but non-contiguous patterns.

Based on our observations of multiprocessor workloads, we have designed Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. In this work, we introduce Galley and discuss its design and implementation. We describe Galley’s new three-dimensional file structure and discuss how that structure can be used by parallel applications to achieve higher performance. We introduce several new data-access interfaces, which allow applications to explicitly describe the regular access patterns we found to be common in parallel file system workloads. We show how these new interfaces allow parallel applications to achieve tremendous increases in I/O performance. Finally, we discuss how Galley’s new file structure and data-access interfaces can be useful in practice.


David Kotz. Applications of Parallel I/O. Technical Report, October 1996. Release 1. [Details]

Scientific applications are increasingly being implemented on massively parallel supercomputers. Many of these applications have intense I/O demands, as well as massive computational requirements. This paper is essentially an annotated bibliography of papers and other sources of information about scientific applications using parallel I/O. It will be updated periodically.

David Kotz. Tuning STARFISH. Technical Report, October 1996. [Details]

STARFISH is a parallel file-system simulator we built for our research into the concept of disk-directed I/O. In this report, we detail steps taken to tune the file systems supported by STARFISH, which include a traditional parallel file system (with caching) and a disk-directed I/O system. In particular, we now support two-phase I/O, use smarter disk scheduling, increased the maximum number of outstanding requests that a compute processor may make to each disk, and added gather/scatter block transfer. We also present results of the experiments driving the tuning effort.

Nils Nieuwejaar, David Kotz, Apratim Purakayastha, Carla Schlatter Ellis, and Michael Best. File-Access Characteristics of Parallel Scientific Workloads. IEEE Transactions on Parallel and Distributed Systems. October 1996. [Details]

Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of multiprocessor file systems.

The design of a high-performance multiprocessor file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of multiprocessor file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide multiprocessor file-system design.


David Kotz. STARFISH parallel file-system simulator. The basis for my research on disk-directed I/O; used by at least two other research groups, October 1996. Third release. [Details]

STARFISH is a simulator for experimenting with concepts in parallel file systems. It is based on Eric Brewer’s Proteus simulator from MIT, version 3.01, and runs only on (MIPS-based) DECstations. I have used this simulator in experiments for several research papers about disk-directed I/O.

David Kotz and Nils Nieuwejaar. Flexibility and Performance of Parallel File Systems. Proceedings of the International Conference of the Austrian Center for Parallel Computation (ACPC). September 1996. Invited paper. [Details]

As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port.

We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (APIs).

We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, re specifically designed with the flexibility and performance necessary to support a wide range of applications.


Joel T. Thomas. The Panda Array I/O Library on the Galley Parallel File System. Technical Report, June 1996. Available as Dartmouth Computer Science Technical Report PCS-TR96-288. [Details]

The Panda Array I/O library, created at the University of Illinois, Urbana-Champaign, was built especially to address the needs of high-performance scientific applications. I/O has been one of the most frustrating bottlenecks to high performance for quite some time, and the Panda project is an attempt to ameliorate this problem while still providing the user with a simple, high-level interface. The Galley File System, with its hierarchical structure of files and strided requests, is another attempt at addressing the performance problem. My project was to redesign the Panda Array library for use on the Galley file system. This project involved porting Panda's three main functions: a checkpoint function for writing a large array periodically for 'safekeeping,' a restart function that would allow a checkpointed file to be read back in, and finally a timestep function that would allow the user to write a group of large arrays several times in a sequence. Panda supports several different distributions in both the compute-node memories and I/O-node disks.

We have found that the Galley File System provides a good environment on which to build high-performance libraries, and that the mesh of Panda and Galley was a successful combination.


Apratim Purakayastha, Carla Schlatter Ellis, and David Kotz. ENWRICH: A Compute-Processor Write Caching Scheme for Parallel File Systems. Proceedings of the Workshop on Input/Output in Parallel and Distributed Systems (IOPADS). May 1996. [Details]

Many parallel scientific applications need high-performance I/O. Unfortunately, end-to-end parallel-I/O performance has not been able to keep up with substantial improvements in parallel-I/O hardware because of poor parallel file-system software. Many radical changes, both at the interface level and the implementation level, have recently been proposed. One such proposed interface is collective I/O, which allows parallel jobs to request transfer of large contiguous objects in a single request, thereby preserving useful semantic information that would otherwise be lost if the transfer were expressed as per-processor non-contiguous requests. Kotz has proposed disk-directed I/O as an efficient implementation technique for collective-I/O operations, where the compute processors make a single collective data-transfer request, and the I/O processors thereafter take full control of the actual data transfer, exploiting their detailed knowledge of the disk-layout to attain substantially improved performance.

Recent parallel file-system usage studies show that writes to write-only files are a dominant part of the workload. Therefore, optimizing writes could have a significant impact on overall performance. In this paper, we propose ENWRICH, a compute-processor write-caching scheme for write-only files in parallel file systems. ENWRICH combines low-overhead write caching at the compute processors with high performance disk-directed I/O at the I/O processors to achieve both low latency and high bandwidth. This combination facilitates the use of the powerful disk-directed I/O technique independent of any particular choice of interface. By collecting writes over many files and applications, ENWRICH lets the I/O processors optimize disk I/O over a large pool of requests. We evaluate our design via simulated implementation and show that ENWRICH achieves high performance for various configurations and workloads.


Nils Nieuwejaar and David Kotz. Performance of the Galley Parallel File System. Proceedings of the Workshop on Input/Output in Parallel and Distributed Systems (IOPADS). May 1996. [Details]

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance I/O to applications that access data in patterns that have been observed to be common.

Nils Nieuwejaar and David Kotz. The Galley Parallel File System. Proceedings of the ACM International Conference on Supercomputing (ICS). May 1996. [Details]

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley’s file structure and application interface, as well as an application that has been implemented using that interface.

Nils Nieuwejaar and David Kotz. The Galley Parallel File System. Technical Report, May 1996. [Details]

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley’s file structure and application interface, as well as the performance advantages offered by that interface.

David Kotz and Nils Nieuwejaar. Flexibility and Performance of Parallel File Systems. ACM Operating Systems Review. April 1996. [Details]

Many scientific applications for high-performance multiprocessors have tremendous I/O requirements. As a result, the I/O system is often the limiting factor of application performance. Several new parallel file systems have been developed in recent years, each promising better performance for some class of parallel applications. As we gain experience with parallel computing, and parallel file systems in particular, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make application portability a significant problem.

We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (APIs). We think of this approach as the “RISC” of parallel file-system design.

We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.


David Kotz. Parallel File Systems. A multimedia lecture included in the CD-ROM “Introductory Lectures on Data-Parallel Computing”, published by AK Peters, Ltd., March 1996. [Details]
1995:
Apratim Purakayastha, Carla Schlatter Ellis, and David Kotz. ENWRICH: A Compute-Processor Write Caching Scheme for Parallel File Systems. Technical Report, October 1995. [Details]

Many parallel scientific applications need high-performance I/O. Unfortunately, end-to-end parallel-I/O performance has not been able to keep up with substantial improvements in parallel-I/O hardware because of poor parallel file-system software. Many radical changes, both at the interface level and the implementation level, have recently been proposed. One such proposed interface is collective I/O, which allows parallel jobs to request transfer of large contiguous objects in a single request, thereby preserving useful semantic information that would otherwise be lost if the transfer were expressed as per-processor non-contiguous requests. Kotz has proposed disk-directed I/O as an efficient implementation technique for collective-I/O operations, where the compute processors make a single collective data-transfer request, and the I/O processors thereafter take full control of the actual data transfer, exploiting their detailed knowledge of the disk-layout to attain substantially improved performance.

Recent parallel file-system usage studies show that writes to write-only files are a dominant part of the workload. Therefore, optimizing writes could have a significant impact on overall performance. In this paper, we propose ENWRICH, a compute-processor write-caching scheme for write-only files in parallel file systems. ENWRICH combines low-overhead write caching at the compute processors with high performance disk-directed I/O at the I/O processors to achieve both low latency and high bandwidth. This combination facilitates the use of the powerful disk-directed I/O technique independent of any particular choice of interface. By collecting writes over many files and applications, ENWRICH lets the I/O processors optimize disk I/O over a large pool of requests. We evaluate our design via simulated implementation and show that ENWRICH achieves high performance for various configurations and workloads.


David Kotz. Expanding the Potential for Disk-Directed I/O. Proceedings of the IEEE Symposium on Parallel and Distributed Processing (SPDP). October 1995. [Details]

As parallel computers are increasingly used to run scientific applications with large data sets, and as processor speeds continue to increase, it becomes more important to provide fast, effective parallel file systems for data storage and for temporary files. In an earlier work we demonstrated that a technique we call disk-directed I/O has the potential to provide consistent high performance for large, collective, structured I/O requests. In this paper we expand on this potential by demonstrating the ability of a disk-directed I/O system to read irregular subsets of data from a file, and to filter and distribute incoming data according to data-dependent functions.

David Kotz. Interfaces for Disk-Directed I/O. Technical Report, September 1995. [Details]

In other papers I propose the idea of disk-directed I/O for multiprocessor file systems. Those papers focus on the performance advantages and capabilities of disk-directed I/O, but say little about the application-programmer’s interface or about the interface between the compute processors and I/O processors. In this short note I discuss the requirements for these interfaces, and look at many existing interfaces for parallel file systems. I conclude that many of the existing interfaces could be adapted for use in a disk-directed I/O system.

David Kotz. Disk-directed I/O for an Out-of-core Computation. Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC). August 1995. [Details]

New file systems are critical to obtain good I/O performance on large multiprocessors. Several researchers have suggested the use of collective file-system operations, in which all processes in an application cooperate in each I/O request. Others have suggested that the traditional low-level interface (read, write, seek) be augmented with various higher-level requests (e.g., read matrix). Collective, high-level requests permit a technique called disk-directed I/O to significantly improve performance over traditional file systems and interfaces, at least on simple I/O benchmarks. In this paper, we present the results of experiments with an “out-of-core” LU-decomposition program. Although its collective interface was awkward in some places, and forced additional synchronization, disk-directed I/O was able to obtain much better overall performance than the traditional system.

Nils Nieuwejaar, David Kotz, Apratim Purakayastha, Carla Schlatter Ellis, and Michael Best. File-Access Characteristics of Parallel Scientific Workloads. Technical Report, August 1995. [Details]

Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems.

The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.


Daniel A. Reed, Charles Catlett, Alok Choudhary, David Kotz, and Marc Snir. Parallel I/O: Getting Ready for Prime Time. IEEE Parallel and Distributed Technology. Summer 1995. Edited transcript of panel discussion at the 1994 International Conference on Parallel Processing. [Details]

During the International Conference on Parallel Processing, held August 15-19, 1994, we convened a panel to discuss the state of the art in parallel I/O, tools and techniques to address current problems, and challenges for the future. The following is an edited transcript of that panel.

Apratim Purakayastha, Carla Schlatter Ellis, David Kotz, Nils Nieuwejaar, and Michael Best. Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor. Proceedings of the International Parallel Processing Symposium (IPPS). April 1995. [Details]

High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, here we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.

David Kotz and Ting Cai. Exploring the use of I/O Nodes for Computation in a MIMD Multiprocessor. Proceedings of the IPPS Workshop on Input/Output in Parallel and Distributed Systems (IOPADS). April 1995. [Details]

As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost-effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between “compute” and “I/O” nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O.

Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high-performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.


David Kotz and Nils Nieuwejaar. File-System Workload on a Scientific Multiprocessor. IEEE Parallel and Distributed Technology. Spring 1995. [Details]

The Charisma project records individual read and write requests in live, multiprogramming parallel workloads. This information can be used to design more efficient multiprocessor systems. We present the first results from the project: a characterization of the file-system workload on an iPSC/860 multiprocessor running production, parallel scientific applications at NASA Ames Research Center. We use the resulting information to address the following questions: What did the job mix look like (that is, how many jobs ran concurrently?) How many files were read and written? Which were temporary files? What were their sizes? What were typical read and write request sizes, and how were they spaced in the file? Were the accesses sequential? What forms of locality were there? How might caching be useful? What are the implications for file-system design?

Nils Nieuwejaar and David Kotz. Low-level Interfaces for High-level Parallel I/O. Proceedings of the IPPS Workshop on Input/Output in Parallel and Distributed Systems (IOPADS). April 1995. [Details]

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. By tracing all the activity of a parallel file system in a production, scientific computing environment, we show that many applications exhibit highly regular, but non-consecutive I/O access patterns. Since the conventional interface does not provide an efficient method of describing these patterns, we present three extensions to the interface that support strided, nested-strided, and nested-batched I/O requests. We show how these extensions can be used to express common access patterns.

David Kotz. Expanding the Potential for Disk-Directed I/O. Technical Report, March 1995. [Details]

As parallel computers are increasingly used to run scientific applications with large data sets, and as processor speeds continue to increase, it becomes more important to provide fast, effective parallel file systems for data storage and for temporary files. In an earlier work we demonstrated that a technique we call disk-directed I/O has the potential to provide consistent high performance for large, collective, structured I/O requests. In this paper we expand on this potential by demonstrating the ability of a disk-directed I/O system to read irregular subsets of data from a file, and to filter and distribute incoming data according to data-dependent functions.

Nils Nieuwejaar and David Kotz. Low-level Interfaces for High-level Parallel I/O. Technical Report, March 1995. Revised 4/18/95 and appeared in IOPADS workshop at IPPS’95. [Details]

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. By tracing all the activity of a parallel file system in a production, scientific computing environment, we show that many applications exhibit highly regular, but non-consecutive I/O access patterns. Since the conventional interface does not provide an efficient method of describing these patterns, we present three extensions to the interface that support strided, nested-strided, and nested-batched I/O requests. We show how these extensions can be used to express common access patterns.

David Kotz. Disk-directed I/O for an Out-of-core Computation. Technical Report, January 1995. [Details]

New file systems are critical to obtain good I/O performance on large multiprocessors. Several researchers have suggested the use of collective file-system operations, in which all processes in an application cooperate in each I/O request. Others have suggested that the traditional low-level interface (read, write, seek) be augmented with various higher-level requests (e.g., read matrix), allowing the programmer to express a complex transfer in a single (perhaps collective) request. Collective, high-level requests permit techniques like two-phase I/O and disk-directed I/O to significantly improve performance over traditional file systems and interfaces. Neither of these techniques have been tested on anything other than simple benchmarks that read or write matrices. Many applications, however, intersperse computation and I/O to work with data sets that cannot fit in main memory. In this paper, we present the results of experiments with an “out-of-core” LU-decomposition program, comparing a traditional interface and file system with a system that has a high-level, collective interface and disk-directed I/O. We found that a collective interface was awkward in some places, and forced additional synchronization. Nonetheless, disk-directed I/O was able to obtain much better performance than the traditional system.

1994:
David Kotz. HP 97560 disk simulation module. Used in STARFISH and several other research projects, 1994. [Details]

We implemented a detailed model of the HP 97560 disk drive, to replicate a model devised by Ruemmler and Wilkes (both of Hewlett-Packard).

David Kotz. Disk-directed I/O for MIMD Multiprocessors. Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI). November 1994. Updated as Dartmouth TR PCS-TR94-226 on November 8, 1994. [Details]

Many scientific applications that run on today’s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth.

David Kotz and Nils Nieuwejaar. Dynamic File-Access Characteristics of a Production Parallel Scientific Workload. Proceedings of Supercomputing. November 1994. [Details]

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors.

Most successful systems are based on a solid understanding of the characteristics of the expected workload, but until now there have been no comprehensive workload characterizations of multiprocessor file systems. We began the CHARISMA project in an attempt to fill that gap. We instrumented the common node library on the iPSC/860 at NASA Ames to record all file-related activity over a two-week period. Our instrumentation is different from previous efforts in that it collects information about every read and write request and about the mix of jobs running in the machine (rather than from selected applications).

The trace analysis in this paper leads to many recommendations for designers of multiprocessor file systems. First, the file system should support simultaneous access to many different files by many jobs. Second, it should expect to see many small requests, predominantly sequential and regular access patterns (although of a different form than in uniprocessors), little or no concurrent file-sharing between jobs, significant byte- and block-sharing between processes within jobs, and strong interprocess locality. Third, our trace-driven simulations showed that these characteristics led to great success in caching, both at the compute nodes and at the I/O nodes. Finally, we recommend supporting strided I/O requests in the file-system interface, to reduce overhead and allow more performance optimization by the file system.


Apratim Purakayastha, Carla Schlatter Ellis, David Kotz, Nils Nieuwejaar, and Michael Best. Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor. Technical Report, October 1994. [Details]

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors.

Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines.

The results of our trace analysis lead to recommendations for parallel file system design. First, the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load condit ions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests, predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs, appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.


David Kotz. Disk-directed I/O for MIMD Multiprocessors. Bulletin of the IEEE Technical Committee on Operating Systems and Application Environments. Autumn 1994. [Details]

Many scientific applications that run on today’s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth.

David Kotz and Ting Cai. Exploring the use of I/O Nodes for Computation in a MIMD Multiprocessor. Technical Report, October 1994. Revised 2/20/95. [Details]

Most MIMD multiprocessors today are configured with two distinct types of processor nodes: those that have disks attached, which are dedicated to file I/O, and those that do not have disks attached, which are used for running applications. Several architectural trends have led some to propose configuring systems so that all processors are used for application processing, even those with disks attached. We examine this idea experimentally, focusing on the impact of remote I/O requests on local computational processes. We found that in an efficient file system the I/O processors can transfer data at near peak speeds with little CPU overhead, leaving substantial CPU power for running applications. On the other hand, we found that some complex file-system features could require substantial CPU overhead. Thus, for a multiprocessor system to obtain good I/O and computational performance on a mix of applications, the file system (both operating system and libraries) must be prepared to adapt their policies to changing conditions.

Nils Nieuwejaar and David Kotz. A Multiprocessor Extension to the Conventional File System Interface. Technical Report, September 1994. [Details]

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. By tracing all the activity of a parallel file system in a production, scientific computing environment, we show that many applications exhibit highly regular, but non-consecutive I/O access patterns. Since the conventional interface does not provide an efficient method of describing these patterns, we present an extension which supports strided and nested-strided I/O requests.

David Kotz. Disk-directed I/O for MIMD Multiprocessors. Technical Report, July 1994. Revised November 8, 1994. [Details]

Many scientific applications that run on today’s multiprocessors are bottlenecked by their file I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and improved file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, that flips the usual relationship between server and client to allow the disks (actually, disk servers) to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, and close to the maximum disk bandwidth.

David Kotz, Song Bac Toh, and Sriram Radhakrishnan. A Detailed Simulation Model of the HP 97560 Disk Drive. Technical Report, July 1994. [Details]

We implemented a detailed model of the HP 97560 disk drive, to replicate a model devised by Ruemmler and Wilkes (both of Hewlett-Packard, HP). Our model simulates one or more disk drives attached to one or more SCSI buses, using a small discrete-event simulation module included in our implementation. The design is broken into three components: a test driver, the disk model itself, and the discrete-event simulation support. Thus, the disk model can be easily extracted and used in other simulation environments. We validated our model using traces obtained from HP, using the same “demerit” measure as Ruemmler and Wilkes. We obtained a demerit figure of 3.9%, indicating that our model was extremely accurate. This paper describes our implementation, and is meant for those wishing to understand our model or to implement their own.

David Kotz and Nils Nieuwejaar. Dynamic File-Access Characteristics of a Production Parallel Scientific Workload. Technical Report, April 1994. Revised May 11, 1994. [Details]

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors.

Most successful systems are based on a solid understanding of the characteristics of the expected workload, but until now there have been no comprehensive workload characterizations of multiprocessor file systems. We began the CHARISMA project in an attempt to fill that gap. We instrumented the common node library on the iPSC/860 at NASA Ames to record all file-related activity over a two-week period. Our instrumentation is different from previous efforts in that it collects information about every read and write request and about the mix of jobs running in the machine (rather than from selected applications).

The trace analysis in this paper leads to many recommendations for designers of multiprocessor file systems. First, the file system should support simultaneous access to many different files by many jobs. Second, it should expect to see many small requests, predominantly sequential and regular access patterns (although of a different form than in uniprocessors), little or no concurrent file-sharing between jobs, significant byte- and block-sharing between processes within jobs, and strong interprocess locality. Third, our trace-driven simulations showed that these characteristics led to great success in caching, both at the compute nodes and at the I/O nodes. Finally, we recommend supporting strided I/O requests in the file-system interface, to reduce overhead and allow more performance optimization by the file system.


1993:
Thomas H. Cormen and David Kotz. Integrating Theory and Practice in Parallel File Systems. Proceedings of the Dartmouth Institute for Advanced Graduate Studies (DAGS). June 1993. Revised as Dartmouth PCS-TR93-188 on 9/20/94. [Details]

Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that must be provided by the system to support these optimal algorithms: control over declustering, querying about the configuration, independent I/O, and turning off parity, file caching, and prefetching. We summarize recent theoretical and empirical work that justifies the need for these capabilities. In addition, we sketch an organization for a parallel file interface with low-level primitives and higher-level operations.

David Kotz. Throughput of Existing Multiprocessor File Systems. Technical Report, May 1993. [Details]

Fast file systems are critical for high-performance scientific computing, since many scientific applications have tremendous I/O requirements. Many parallel supercomputers have only recently obtained fully parallel I/O architectures and file systems, which are necessary for scalable I/O performance. Scalability aside, I show here that many systems lack sufficient absolute performance. I do this by surveying the performance reported in the literature, summarized in an informal table.

Thomas H. Cormen and David Kotz. Integrating Theory and Practice in Parallel File Systems. Technical Report, March 1993. Revised 9/20/94. [Details]

Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present a list of capabilities that must be provided by the system to support these optimal algorithms: control over declustering, querying about the configuration, independent I/O, turning off file caching and prefetching, and bypassing parity. We summarize recent theoretical and empirical work that justifies the need for these capabilities.

David Kotz. Multiprocessor File System Interfaces. Proceedings of the International Conference on Parallel and Distributed Information Systems (PDIS). January 1993. [Details]

Increasingly, file systems for multiprocessors are designed with parallel access to multiple disks, to keep I/O from becoming a serious bottleneck for parallel applications. Although file system software can transparently provide high-performance access to parallel disks, a new file system interface is needed to facilitate parallel access to a file from a parallel application. We describe the difficulties faced when using the conventional (Unix-like) interface in parallel applications, and then outline ways to extend the conventional interface to provide convenient access to the file for parallel programs, while retaining the traditional interface for programs that have no need for explicitly parallel file access. Our interface includes a single naming scheme, a multiopen operation, local and global file pointers, mapped file pointers, logical records, multifiles, and logical coercion for backward compatibility.

David Kotz and Carla Schlatter Ellis. Practical Prefetching Techniques for Multiprocessor File Systems. Journal of Distributed and Parallel Databases. January 1993. [Details]

Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potential to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies that base decisions only on on-line reference history, and that can be implemented efficiently. We also test the ability of these policies across a range of architectural parameters.

David Kotz and Carla Schlatter Ellis. Caching and Writeback Policies in Parallel File Systems. Journal of Parallel and Distributed Computing. January 1993. [Details]

Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. Such parallel disk systems require parallel file system software to avoid performance-limiting bottlenecks. We discuss cache management techniques that can be used in a parallel file system implementation for multiprocessors with scientific workloads. We examine several writeback policies, and give results of experiments that test their performance.

1992:
David Kotz. Multiprocessor File System Interfaces. Proceedings of the USENIX File Systems Workshop (WOFS). May 1992. [Details]
David Kotz. Multiprocessor File System Interfaces. Technical Report, March 1992. [Details]

Increasingly, file systems for multiprocessors are designed with parallel access to multiple disks, to keep I/O from becoming a serious bottleneck for parallel applications. Although file system software can transparently provide high-performance access to parallel disks, a new file system interface is needed to facilitate parallel access to a file from a parallel application. We describe the difficulties faced when using the conventional (Unix-like) interface in parallel applications, and then outline ways to extend the conventional interface to provide convenient access to the file for parallel programs, while retaining the traditional interface for programs that have no need for explicitly parallel file access. Our interface includes a single naming scheme, a multiopen operation, local and global file pointers, mapped file pointers, logical records, multifiles, and logical coercion for backward compatibility.

1991:
David Kotz. RAPID-Transit parallel file-system simulator. The software basis for my Ph.D dissertation, 1991. [Details]

RAPID-Transit was a testbed for experimenting with caching and prefetching algorithms in parallel file systems (RAPID means “Read-Ahead for Parallel Independent Disks”), and was part of the larger NUMAtic project at Duke University. The testbed ran on Duke’s 64-processor Butterfly GP1000. The model we used had a disk attached to every processor, and that each file was striped across all disks. Of course, Duke’s GP1000 had only one real disk, so our testbed simulated its disks. The implementation and some of the policies were dependent on the shared-memory nature of the machine; for example, there was a single shared file cache accessible to all processors. We found several policies that were successful at prefetching in a variety of parallel file-access patterns.

David Kotz and Carla Schlatter Ellis. Practical Prefetching Techniques for Parallel File Systems. Proceedings of the International Conference on Parallel and Distributed Information Systems (PDIS). December 1991. [Details]

Parallel disk subsystems have been proposed as one way to close the gap between processor and disk speeds. In a previous paper we showed that prefetching and caching have the potential to deliver the performance benefits of parallel file systems to parallel applications. In this paper we describe experiments with practical prefetching policies, and show that prefetching can be implemented efficiently even for the more complex parallel file access patterns. We test these policies across a range of architectural parameters.

David Kotz and Carla Schlatter Ellis. Caching and Writeback Policies in Parallel File Systems. Proceedings of the IEEE Symposium on Parallel and Distributed Processing (SPDP). December 1991. [Details]

Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. Such parallel disk systems require parallel file system software to avoid performance-limiting bottlenecks. We discuss cache management techniques that can be used in a parallel file system implementation. We examine several writeback policies, and give results of experiments that test their performance.

David Kotz. Prefetching and Caching Techniques in File Systems for MIMD Multiprocessors. PhD thesis, April 1991. Available as technical report CS-1991-016. [Details]

The increasing speed of the most powerful computers, especially multiprocessors, makes it difficult to provide sufficient I/O bandwidth to keep them running at full speed for the largest problems. Trends show that the difference in the speed of disk hardware and the speed of processors is increasing, with I/O severely limiting the performance of otherwise fast machines. This widening access-time gap is known as the “I/O bottleneck crisis.” One solution to the crisis, suggested by many researchers, is to use many disks in parallel to increase the overall bandwidth.

This dissertation studies some of the file system issues needed to get high performance from parallel disk systems, since parallel hardware alone cannot guarantee good performance. The target systems are large MIMD multiprocessors used for scientific applications, with large files spread over multiple disks attached in parallel. The focus is on automatic caching and prefetching techniques. We show that caching and prefetching can transparently provide the power of parallel disk hardware to both sequential and parallel applications using a conventional file system interface. We also propose a new file system interface (compatible with the conventional interface) that could make it easier to use parallel disks effectively.

Our methodology is a mixture of implementation and simulation, using a software testbed that we built to run on a BBN GP1000 multiprocessor. The testbed simulates the disks and fully implements the caching and prefetching policies. Using a synthetic workload as input, we use the testbed in an extensive set of experiments. The results show that prefetching and caching improved the performance of parallel file systems, often dramatically.


C. Ellis, M. Holliday, R. LaRowe, D. Kotz, V. Khera, S. Owen, and C. Connelly. NUMAtic Project and the DUnX OS. IEEE Technical Committee on Operating Systems and Application Environments (Newsletter). Winter 1991. [Details]
1990:
David F. Kotz and Carla Schlatter Ellis. Prefetching in File Systems for MIMD Multiprocessors. IEEE Transactions on Parallel and Distributed Systems. April 1990. [Details]

The problem of providing file I/O to parallel programs has been largely neglected in the development of multiprocessor systems. There are two essential elements of any file system design intended for a highly parallel environment: parallel I/O and effective caching schemes. This paper concentrates on the second aspect of file system design and specifically, on the question of whether prefetching blocks of the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions.

Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that 1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, 2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O operation, and 3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study).

We explore why is it not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in this environment.


1989:
Carla Schlatter Ellis and David Kotz. Prefetching in File Systems for MIMD Multiprocessors. Proceedings of the International Conference on Parallel Processing (ICPP). August 1989. [Details]

The problem of providing file I/O to parallel programs has been largely neglected in the development of multiprocessor systems. There are two essential elements of any file system design intended for a highly parallel environment: parallel I/O and effective caching schemes. This paper concentrates on the second aspect of file system design and specifically, on the question of whether prefetching blocks of the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions.

Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that 1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, 2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O operation, and 3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study).

We explore why is it not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in this environment.


David Kotz. High-performance File System Design for MIMD Parallel Processors. A talk presented at the DARPA Workshop on Parallel Processing at UMIACS, August 1989. Audiovisual presentation. [Details]
1988:
Carla Schlatter Ellis and David Kotz. Prefetching in File Systems for MIMD Multiprocessors. Technical Report, November 1988. [Details]

The problem of providing file I/O to parallel programs has been largely neglected in the development of multiprocessor systems. There are two essential elements of any file system design intended for a highly parallel environment: parallel I/O and effective caching schemes. This paper concentrates on the second aspect of file system design and specifically, on the question of whether prefetching blocks of the file into the block cache can effectively reduce overall execution time of a parallel computation. MIMD multiprocessor architectures have a profound impact on the nature of the workloads they support. In particular, it is the collective behavior of the processes in a parallel computation that often determines the performance. The assumptions about file access patterns that underlie much of the work in uniprocessor file management are called into question. Results from experiments performed on the Butterfly Plus multiprocessor are presented showing the benefits that can be derived from prefetching (e.g. significant improvements in the cache miss ratio and the average time to perform an I/O operation). We explore why it is not trivial to translate these gains into much better overall performance.


[Kotz research]