Dartmouth College Computer Science Technical Report series CS home TR home TR search TR listserv By author: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z By number: 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986

Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems
Thomas H. Cormen, Thomas Sundquist, Leonard F. Wisniewski
Dartmouth PCS-TR94-223

Abstract:

We give asymptotically equal lower and upper bounds for the number of parallel I/O operations required to perform bit-matrix-multiply/complement (BMMC) permutations on parallel disk systems. In a BMMC permutation on N records, where N is a power of 2, each (lg N)-bit source address x maps to a corresponding (lg N)-bit target address y by the matrix equation y = Ax XOR c, where matrix multiplication is performed over GF(2). The characteristic matrix A is (lg N) x (lg N) and nonsingular over GF(2). Under the Vitter-Shriver parallel-disk model with N records, D disks, B records per block, and M records of memory, we show a universal lower bound of $\Omega \left( \frac{N}{BD} \left( 1 + \frac{\rank{\gamma}}{\lg (M/B)} \right) \right)$ parallel I/Os for performing a BMMC permutation, where gamma is the lower left (lg (N/B)) x (lg B) submatrix of the characteristic matrix. We adapt this lower bound to show that the algorithm for bit-permute/complement (BPC) permutations in Cormen93a is asymptotically optimal. We also present an algorithm that uses at most $\frac{2N}{BD} \left( 4 \ceil{\frac{\rank{\gamma}}{\lg (M/B)}} + 4 \right)$ parallel I/Os, which asymptotically matches the lower bound and improves upon the BMMC algorithm in Cormen93a. When rank (gamma) is low, this method is an improvement over the general-permutation bound of $\Theta \left( \frac{N}{BD} \frac{\lg (N/B)}{\lg (M/B)} \right)$.

We introduce a new subclass of BMMC permutations, called memory-load-dispersal (MLD) permutations, which can be performed in one pass. This subclass, which is used in the BMMC algorithm, extends the catalog of one-pass permutations appearing in Cormen93a.

Although many BMMC permutations of practical interest fall into subclasses that might be explicitly invoked within the source code, we show how to detect in at most $N/BD + \ceil{\frac{\lg (N/B) + 1}{D}}$ parallel I/Os whether a given vector of target addresses specifies a BMMC permutation. Thus, one can determine efficiently at run time whether a permutation to be performed is BMMC and then avoid the general-permutation algorithm and save parallel I/Os by using our algorithm.

compressed postscript .ps.Z (164KB) , PDF (396KB) (derived from the ps.Z)

Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]

Or copy and paste:
Thomas H. Cormen, Thomas Sundquist, and Leonard F. Wisniewski, "Asymptotically Tight Bounds for Performing BMMC Permutations on Parallel Disk Systems." Dartmouth Computer Science Technical Report PCS-TR94-223, 1994.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu

Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Technical reports collection maintained by David Kotz.