BACKGROUND ---------- First we state that all algorithms expect the number of samples and coefficients to be a power of 2, and that sampling is done at the Chebyshev nodes. The NAIVE algorithm is just the projection of the function onto the Legendre functions, sampled at the Chebyshev nodes (zeroes of T_2n). The SEMINAIVE algorithm is an O(N^2) algorithm that performs projections in frequency space rather than time space, i.e., it projects the function to be transformed onto cosine series representations of the associated Legendre functions. This is a fast algorithm in practice, provided that the cosine series representations of the Legendre functions have been precomputed. Note: in performing a forward spherical transform via the seminaive algorithm, the total size of the cosine series representations (the "precomputed data") grows rather quickly as the bandwidth of the problem increases: bw = 64 -> about 0.34 megabytes of precomputed data bw = 128 -> about 2.69 megabytes of precomputed data bw = 256 -> about 21.5 megabytes of precomputed data bw = 512 -> about 171 megabytes of precomputed data bw = 1024 -> about 1367 megabytes of precomputed data Depending on memory available and disk storage capabilities, you may want to compute the cosine series as needed (``on the fly") or read the precomputed data off disk. In its original form, the basic DRISCOLL-HEALY (DH) algorithm computes a spherical transform of a function with harmonics of at most order N in O(N^2 (log N)^2) time. This is an asymptotic result, exact in exact arithmetic. The key component of this algorithm is a fast Legendre transform which, assuming a precomputed data structure of size O(N log N), can be performed in O(N (log N)^2) operations. The fast Legendre transform algorithm works as follows. The original problem of size N is reduced (by smoothing and subsampling) to two smaller problems, each of size N/2. Then those two smaller problems are themselves smoothed and subsampled. So now there are four subproblems, each of size N/4. One continues to divide-and-conquer "all the way down." In terms of implementation, however, one would not divide-and- conquer "all the way down." Overhead costs would render the program computationally inefficient. This is one reason why more computer-friendly variations of the fast Legendre transform algorithm were developed. Another reason for developing variations of the above Legendre transform was the unstable nature of the original algorithm. An integral part of the algorithm is its use of shifted Legendre polynomials. As the order m of the transform increases, these polynomials become numerically suspect. Therefore, one needs to take care on how they are used. One can do this with the HYBRID ALGORITHM. Our experience indicates that the HYBRID ALGORITHM is the fastest of the variant algorithms. One can make the HYBRID LEGENDRE TRANSFORM algorithm numerically stable by varying a number of settings, but which setting is optimal (in terms of runtime and accuracy) depends on the platform the code is run on. The settings provided here can be thought of as starting points. You must tailor them to your computer. Note: in performing a forward spherical transform via the HYBRID ALGORITHM, the total size of the precomputed data grows quickly, but not as quickly as the seminaive transform's needs. The exact growth is difficult to predict since it depends on how much (and how) the hybrid algorithm is used in a spherical transform. But to at least have some figures to compare, here are the sizes of precomputed data as we have used the hybrid algorithm: bw = 64 -> about 0.32 megabytes of precomputed data bw = 128 -> about 2.31 megabytes of precomputed data bw = 256 -> about 17.6 megabytes of precomputed data bw = 512 -> about 136 megabytes of precomputed data bw = 1024 -> about 869 megabytes of precomputed data Note that at bw = 1024 the hybrid-based spherical transform uses significantly less precomputed data than the seminaive spherical transform. But this is for us - your mileage may vary !!! As we said above, the hybrid algorithm's performance appears to be rather architecture dependent. For example, let bw = 1024, and order m = 0. The hybrid Legendre transform was faster than the seminaive Legendre transform on both a DEC Alpha and HP Exemplar. However, at order m = 512 (same bandwidth) the hybrid algorithm was faster than the seminaive algorithm on the HP, but the reverse was true on the DEC! So again, performance depends on how the hybrid algorithm is used. But in terms of precomputed data size, the seminaive sizes may be taken as a sort of "maximum" for the hybrid algorithm (if the hybrid algorithm was used in the "worst" possible way). One final set of numbers: on the DEC and HP, the seminaive forward spherical transform and hybrid forward spherical transform were roughly competitive at bw = 512. On the HP, at bw = 1024 we found the hybrid algorithm to be roughly 30% faster, in terms of cpu and walltime. The code was not tested at bw = 1024 on the DEC because of hardware limitations (i.e. not enough available disk space).
Return to SpharmonicKit