TWO PARALLEL 1-D FFT ALGORITHMS WITHOUT ALL-TO-ALL COMMUNICATION
Abstract
Computing the 1-D Fast Fourier Transform (FFT) using the conventional six-step FFT algorithm on parallel computers requires intensive all-to-all communication due to the necessity of matrix transpose in three steps. This all-to-all communication is a limiting factor in improving the performance of FFT in its parallel implementations. In this paper, we present two parallel algorithms for implementing the 1-D FFT without all-to-all communication between processors, at the expense of increased inner-processor computation as compared to the conventional six-step FFT algorithm. Our analysis reveals the advantage of these two algorithms over the six-step FFT algorithm in parallel systems where the cost of inter-processor communication outweighs the cost of inner-processor computation. As a case study, we choose a 32-node Beowulf cluster with fast processors (running at 2 GHz) but relatively slow inter-processor communication (over a 100 Mbit/s switch). Simulation results on this cluster demonstrate that the proposed no-communication FFT algorithms can achieve a speedup ranging from 1.1 to 1.5 over the six-step FFT algorithm.
References
- Parallel Computing 5, 197 (1987). Crossref, ISI, Google Scholar
- Parallel Computing 27, 847 (2001). Crossref, ISI, Google Scholar
- Journal of Supercomputing 4, 23 (1990). Crossref, ISI, Google Scholar
- IEEE Trans. Parallel and Distributed Systems 1, 377 (1990). Crossref, Google Scholar
- IEEE Trans. Acoust., Speech, and Signal Processing ASSP-26, 1036 (1988). Google Scholar
C. Lu , Hybrid parallel FFT algorithm without interprocessor communication, Proc. of IEEE International Conference on ASSP pp. iii-281–iii-284. Google Scholar- IEEE Trans. Signal Processing 43, 272 (1995). Crossref, ISI, Google Scholar
- IEEE Trans. Computers 48, 951 (1999). Crossref, ISI, Google Scholar
- IEEE Trans. Signal Processing 47, 3179 (1999). Crossref, ISI, Google Scholar
-
H. Shan , Message passing vs. shared address space on a cluster of SMPs , Proc. 15th International Parallel and Distributed Processing Symposium . Google Scholar S. Johnsson and C.-T. Ho , Optimal all-to-all personalized communication with minimum span on boolean cubes, Proc. Distributed Memory Computing Conference pp. 299–304. Google ScholarD. Scott , Efficient all-to-all communication patterns in hypercube and mesh topologies, Proc. Distributed Memory Computing Conference pp. 398–403. Google Scholar


