Looking at those files that were shared by multiple nodes, we found that this access pattern occurred frequently in practice. The figure below shows that many of the accesses to CFS and CMMD files appeared to be part of a simple-strided access pattern. Since consecutive access could be considered a trivial form of strided access (with an interval of 0), the figure shows the frequency of strided accesses with and without consecutive accesses included.
In either case, over 80% of all the mutli-node files in CFS were apparently accessed entirely with a strided pattern. Strided access was also common in CMMD, with over 60% of the files being accessed entirely in a strided, non-consecutive pattern. If we exclude consecutive access, there appeared to be almost no strided access in CMF, with no more than 20% of the requests to any file taking part in a strided pattern. This lack of strided access in CMF is not surprising, since strided access is typically caused by the explicit expression of data distribution in a control-parallel program. Accordingly, the remainder of our discussion will focus on CFS and CMMD.
We define a strided segment to be a group of requests that appear to be part of a single simple-strided pattern. While the figure above shows the percentage of requests that were involved in some strided segment, it does not tell us whether each file was accessed with a single, file-long strided segment, or with many shorter segments. The figure below shows that while most files had only a few strided segments, there were some files that were accessed with many strided segments.
The number of requests in a segment varied between the machines. Most segments in CFS fell into the range of 20 to 30 requests and most of the segments in CMMD had 55 to 65 requests. There were some files that were accessed with much longer segments on both systems.
While the existence of these simple-strided patterns is interesting and potentially useful, the fact that many files were accessed in multiple short segments suggests that there was a level of structure beyond that described by a simple-strided pattern.
The simple-strided sub-pattern corresponds to the requests generated within each row of the matrix, while the top-level pattern corresponds to the distance between one row and the next. This access pattern could also be generated by an application that was reading a single column of data from a three-dimensional matrix. Higher levels of nesting could occur if an application mapped a multidimensional matrix onto a set of processors.
Maximum Level Number of Number of of Nesting CFS files CMMD files ------------------------------------ 0 469 38 1 10945 311 2 747 102 3 5151 148 4+ 0 3
by Nils Nieuwejaar