This project is no longer active; this page is no longer updated.
Related projects: [Armada], [CHARISMA], [Galley], [Parallel-I/O], [RAPID-Transit]
Related keywords: [pario], [software]
Large parallel computing systems, especially those used for scientific computation, consume and produce huge amounts of data. To provide the necessary semantics for parallel processes accessing a file, and to provide the necessary throughput for an application working with terabytes of data, requires a multiprocessor file system.
In the STARFISH project we developed the concept of disk-directed I/O, in which the application process requested a large parallel data transfer to or from a parallel file, and then the file system arranged the transfer of information between disks and memory in a way that suited the disks' own timing. The results show strong performance benefits--- but only if suitable interfaces allow the application to make such requests known to the file system at a high level. The most complete paper is [kotz:jdiskdir]. An overview was presented in a 1994 talk at NASA [video].
STARFISH is a simulator for experimenting with concepts in parallel file systems. It is based on Eric Brewer's Proteus simulator from MIT, version 3.01, and runs only on (MIPS-based) DECstations.
The name: The name STARFISH is an acronym (Simulation Tool for Advanced Research in File Systems), but it fits with the maritime theme of the Proteus simulator on which it is based (Proteus was a Greek god of the sea).
Warning: I provide the code as-is, with little cleanup or added documentation. Some of the code is out-of-date and may have bugs. Other parts are incomplete. Many of the analysis scripts are fragile. The code is constantly evolving, and new public releases may be rare. But many people have asked me for it, so here it is.
See other warnings in the README file.
Usage rules: You're welcome to look at the code and even try to run it, but I really don't have time to help you out much. If you publish any results based on this simulator, please cite me and provide the URL for this page.
Copying rules: This package may be freely copied as long as it is kept intact with my name on it. You may not sell it for commercial purposes (hah! as if anyone would pay for it.) Please send me a note if you have a copy of this code, so I can keep track of how many copies there are, send you email about new versions, and so forth. Please ask me before you distribute any modified version.
Note re: kotz:diskdir The Version 2 code evolved after the experiments in some of those papers were run; in particular, the OSDI results were based on an earlier, buggier version of iopfs-cache. See the TR version of that paper for correct results.
This project was supported largely by the US National Science Foundation under award CCR-940919.
The views and conclusions contained on this site and in its documents are those of the authors and should not be interpreted as necessarily representing the official position or policies, either expressed or implied, of the sponsor(s). Any mention of specific companies or products does not imply any endorsement by the authors or by the sponsor(s).
[Also available in BibTeX]
Papers are listed in reverse-chronological order;
click an entry to pop up the abstract.
For full information and pdf, please click Details link.
Follow updates with RSS.
Recent parallel file-system usage studies show that writes to write-only files are a dominant part of the workload. Therefore, optimizing writes could have a significant impact on overall performance. In this paper, we propose ENWRICH, a compute-processor write-caching scheme for write-only files in parallel file systems. ENWRICH combines low-overhead write caching at the compute processors with high performance disk-directed I/O at the I/O processors to achieve both low latency and high bandwidth. This combination facilitates the use of the powerful disk-directed I/O technique independent of any particular choice of interface. By collecting writes over many files and applications, ENWRICH lets the I/O processors optimize disk I/O over a large pool of requests. We evaluate our design via simulated implementation and show that ENWRICH achieves high performance for various configurations and workloads.
Recent parallel file-system usage studies show that writes to write-only files are a dominant part of the workload. Therefore, optimizing writes could have a significant impact on overall performance. In this paper, we propose ENWRICH, a compute-processor write-caching scheme for write-only files in parallel file systems. ENWRICH combines low-overhead write caching at the compute processors with high performance disk-directed I/O at the I/O processors to achieve both low latency and high bandwidth. This combination facilitates the use of the powerful disk-directed I/O technique independent of any particular choice of interface. By collecting writes over many files and applications, ENWRICH lets the I/O processors optimize disk I/O over a large pool of requests. We evaluate our design via simulated implementation and show that ENWRICH achieves high performance for various configurations and workloads.
Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high-performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.