Parallel File System Workload Characterization
Most parallel file systems (eg, Intel's CFS, Thinking Machines SFS) have
been designed around the assumption that scientific applications running on
parallel computers would exhibit behavior similar to that of scientific
applications running on uniprocessors and vector supercomputers.
The primary characteristics of file access in those environments are:
To test the validity of that assumption, we traced the workloads of two
different parallel file systems, on two different machines, at two different
sites, running primarily scientific applications. The tracing involved
recording every single access that was made to the parallel file system over
a period of weeks.
- Files are huge - hundreds of megabytes, gigabytes, or larger.
- Files are accessed in large pieces - hundreds of kilobytes or megabytes
at a time.
- Files are accessed sequentially. That is, every byte in the file is
accessed, in order, from beginning to end.
The two machines we traced were an
Intel iPSC/860 at
Numerical Aerodynamic Simulation
facility and a
Thinking Machines CM-5 at the
National Center for Supercomputing
Applications . All parallel file access on the iPSC was done
through Intel's Concurrent File System. Parallel applications on the
CM-5 could use either the data-parallel CMF I/O library or the control
parallel CMMD I/O library.
Our observations may be summarized as follows:
We examined the millions of small, noncontiguous requests in greater
detail, and found that most of them appeared to be part of regular, higher-level pattern.
- Many parallel applications access files in small (64-256 bytes),
- Within a single file, these pieces tend to be regularly sized and
- Many parallel applications use many different files in a single run.
- There is a great deal of interprocessor sharing of files.
- There is very little interjob sharing of files.
Nils A. Nieuwejaar
David Kotz <firstname.lastname@example.org>
Last modified: Tue Dec 12 10:41:14 2000