CHARISMA (CHARacterize I/O in Scientific Multiprocessor Applications) project (1993-1996)

This project is no longer active; this page is no longer updated.

Related projects: [Galley], [Parallel-I/O], [RAPID-Transit], [STARFISH]

Related keywords: [pario]


Summary

Large parallel computing systems, especially those used for scientific computation, consume and produce huge amounts of data. To provide the necessary semantics for parallel processes accessing a file, and to provide the necessary throughput for an application working with terabytes of data, requires a multiprocessor file system.

One of the big challenges facing research on parallel file systems was to develop a solid understanding of the workload: what do parallel programmers actually do with parallel file systems. In June 1993 We launched a cooperative effort, called CHARISMA (CHARacterize I/O in Scientific Multiprocessor Applications), to collect and analyze file-system traces from multiple applications on several different file systems. The CHARISMA project was unique in recording individual read and write requests in live, multiprogramming, parallel workloads (rather than from selected or non-parallel applications). The resulting papers are some of the only work to characterize production parallel computer systems.

Most parallel file systems (eg, Intel's CFS, Thinking Machines SFS) were designed around the assumption that scientific applications running on parallel computers would exhibit behavior similar to that of scientific applications running on uniprocessors and vector supercomputers.

The primary characteristics of file access in those environments are:

To test the validity of that assumption, we traced the workloads of two different parallel file systems, on two different machines, at two different sites, running primarily scientific applications. The tracing involved recording every single access that was made to the parallel file system over a period of weeks. The trace format is described in a C include file.

The two machines we traced were an Intel iPSC/860 at NASA Ames' Numerical Aerodynamic Simulation facility and a Thinking Machines CM-5 at the National Center for Supercomputing Applications. All parallel file access on the iPSC was done through Intel's Concurrent File System. Parallel applications on the CM-5 could use either the data-parallel CMF I/O library or the control parallel CMMD I/O library.

Results

by Nils Nieuwejaar

Our observations may be summarized as follows:

We examined the millions of small, noncontiguous requests in greater detail, and found that most of them appeared to be part of regular, higher-level pattern, as follows.

Simple-Strided

We refer to a series of I/O requests as a simple-strided access pattern if each request is for the same number of bytes, and if the file pointer is incremented by the same amount between each request. Two possible situations in which this pattern could arise are shown below. The first case shows the columns of a matrix distributed across the processors in a cyclic pattern, when the number of columns is a multiple of the number of processors. The second case shows the rows of a matrix distributed across the processors. In this case, each processor reads an entire row at a time, and then skips to the beginning of the next row.

Looking at those files that were shared by multiple nodes, we found that this access pattern occurred frequently in practice. The figure below shows that many of the accesses to CFS and CMMD files appeared to be part of a simple-strided access pattern. Since consecutive access could be considered a trivial form of strided access (with an interval of 0), the figure shows the frequency of strided accesses with and without consecutive accesses included.

In either case, over 80% of all the mutli-node files in CFS were apparently accessed entirely with a strided pattern. Strided access was also common in CMMD, with over 60% of the files being accessed entirely in a strided, non-consecutive pattern. If we exclude consecutive access, there appeared to be almost no strided access in CMF, with no more than 20% of the requests to any file taking part in a strided pattern. This lack of strided access in CMF is not surprising, since strided access is typically caused by the explicit expression of data distribution in a control-parallel program. Accordingly, the remainder of our discussion will focus on CFS and CMMD.

We define a strided segment to be a group of requests that appear to be part of a single simple-strided pattern. While the figure above shows the percentage of requests that were involved in some strided segment, it does not tell us whether each file was accessed with a single, file-long strided segment, or with many shorter segments. The figure below shows that while most files had only a few strided segments, there were some files that were accessed with many strided segments.

The number of requests in a segment varied between the machines. Most segments in CFS fell into the range of 20 to 30 requests and most of the segments in CMMD had 55 to 65 requests. There were some files that were accessed with much longer segments on both systems.

While the existence of these simple-strided patterns is interesting and potentially useful, the fact that many files were accessed in multiple short segments suggests that there was a level of structure beyond that described by a simple-strided pattern.

Nested Patterns

A nested-strided access pattern is similar to a simple-strided access pattern, but rather than being composed of simple requests separated by regular strides in the file, it is composed of strided segments separated by regular strides in the file. The simple-strided patterns examined above could be called singly-nested patterns. A doubly-nested pattern could correspond to the pattern generated by an application that distributed the columns of a matrix stored in row-major order across its processors in a cyclic pattern, if the columns could not be distributed evenly across the processors (below, left). Another possible source of such an access pattern is a matrix distributed in a block-cyclic pattern (below, right).

The simple-strided sub-pattern corresponds to the requests generated within each row of the matrix, while the top-level pattern corresponds to the distance between one row and the next. This access pattern could also be generated by an application that was reading a single column of data from a three-dimensional matrix. Higher levels of nesting could occur if an application mapped a multidimensional matrix onto a set of processors.


Maximum Level  Number of  Number of
of Nesting     CFS files  CMMD files
------------------------------------
    0             469         38
    1           10945        311
    2             747        102
    3            5151        148
    4+              0          3

The above table shows how frequently nested patterns occurred in CFS and CMMD. A file that had no apparent strided accesses had zero levels of nesting. Files that were accessed with only simple-strided patterns had a single level of nesting. Interestingly, on both machines it was far more common for files to exhibit three levels of nesting than two. This tendency suggests that the use of multidimensional matrices was common on both systems.


People

Funding and acknowledgements

This research was supported in part by the US NASA Ames Research Center under Agreement Number NCC 2-849, the US National Science Foundation under grant number CCR-9113170, the US National Center for Supercomputing Applications, and Thinking Machines Corporation.

The views and conclusions contained on this site and in its documents are those of the authors and should not be interpreted as necessarily representing the official position or policies, either expressed or implied, of the sponsor(s). Any mention of specific companies or products does not imply any endorsement by the authors or by the sponsor(s).


Papers tagged 'charisma'

[Also available in BibTeX]

Papers are listed in reverse-chronological order; click an entry to pop up the abstract. For full information and pdf, please click Details link. Follow updates with RSS.

1996:
Nils Nieuwejaar, David Kotz, Apratim Purakayastha, Carla Schlatter Ellis, and Michael Best. File-Access Characteristics of Parallel Scientific Workloads. IEEE Transactions on Parallel and Distributed Systems. October 1996. [Details]

Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of multiprocessor file systems.

The design of a high-performance multiprocessor file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of multiprocessor file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide multiprocessor file-system design.


1995:
Nils Nieuwejaar, David Kotz, Apratim Purakayastha, Carla Schlatter Ellis, and Michael Best. File-Access Characteristics of Parallel Scientific Workloads. Technical Report, August 1995. [Details]

Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems.

The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.


Apratim Purakayastha, Carla Schlatter Ellis, David Kotz, Nils Nieuwejaar, and Michael Best. Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor. Proceedings of the International Parallel Processing Symposium (IPPS). April 1995. [Details]

High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, here we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.

David Kotz and Nils Nieuwejaar. File-System Workload on a Scientific Multiprocessor. IEEE Parallel and Distributed Technology. Spring 1995. [Details]

The Charisma project records individual read and write requests in live, multiprogramming parallel workloads. This information can be used to design more efficient multiprocessor systems. We present the first results from the project: a characterization of the file-system workload on an iPSC/860 multiprocessor running production, parallel scientific applications at NASA Ames Research Center. We use the resulting information to address the following questions: What did the job mix look like (that is, how many jobs ran concurrently?) How many files were read and written? Which were temporary files? What were their sizes? What were typical read and write request sizes, and how were they spaced in the file? Were the accesses sequential? What forms of locality were there? How might caching be useful? What are the implications for file-system design?

1994:
David Kotz and Nils Nieuwejaar. Dynamic File-Access Characteristics of a Production Parallel Scientific Workload. Proceedings of Supercomputing. November 1994. [Details]

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors.

Most successful systems are based on a solid understanding of the characteristics of the expected workload, but until now there have been no comprehensive workload characterizations of multiprocessor file systems. We began the CHARISMA project in an attempt to fill that gap. We instrumented the common node library on the iPSC/860 at NASA Ames to record all file-related activity over a two-week period. Our instrumentation is different from previous efforts in that it collects information about every read and write request and about the mix of jobs running in the machine (rather than from selected applications).

The trace analysis in this paper leads to many recommendations for designers of multiprocessor file systems. First, the file system should support simultaneous access to many different files by many jobs. Second, it should expect to see many small requests, predominantly sequential and regular access patterns (although of a different form than in uniprocessors), little or no concurrent file-sharing between jobs, significant byte- and block-sharing between processes within jobs, and strong interprocess locality. Third, our trace-driven simulations showed that these characteristics led to great success in caching, both at the compute nodes and at the I/O nodes. Finally, we recommend supporting strided I/O requests in the file-system interface, to reduce overhead and allow more performance optimization by the file system.


Apratim Purakayastha, Carla Schlatter Ellis, David Kotz, Nils Nieuwejaar, and Michael Best. Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor. Technical Report, October 1994. [Details]

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors.

Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines.

The results of our trace analysis lead to recommendations for parallel file system design. First, the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load condit ions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests, predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs, appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.


David Kotz and Nils Nieuwejaar. Dynamic File-Access Characteristics of a Production Parallel Scientific Workload. Technical Report, April 1994. Revised May 11, 1994. [Details]

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors.

Most successful systems are based on a solid understanding of the characteristics of the expected workload, but until now there have been no comprehensive workload characterizations of multiprocessor file systems. We began the CHARISMA project in an attempt to fill that gap. We instrumented the common node library on the iPSC/860 at NASA Ames to record all file-related activity over a two-week period. Our instrumentation is different from previous efforts in that it collects information about every read and write request and about the mix of jobs running in the machine (rather than from selected applications).

The trace analysis in this paper leads to many recommendations for designers of multiprocessor file systems. First, the file system should support simultaneous access to many different files by many jobs. Second, it should expect to see many small requests, predominantly sequential and regular access patterns (although of a different form than in uniprocessors), little or no concurrent file-sharing between jobs, significant byte- and block-sharing between processes within jobs, and strong interprocess locality. Third, our trace-driven simulations showed that these characteristics led to great success in caching, both at the compute nodes and at the I/O nodes. Finally, we recommend supporting strided I/O requests in the file-system interface, to reduce overhead and allow more performance optimization by the file system.



[Kotz research]