World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

TWO PARALLEL 1-D FFT ALGORITHMS WITHOUT ALL-TO-ALL COMMUNICATION

    Computing the 1-D Fast Fourier Transform (FFT) using the conventional six-step FFT algorithm on parallel computers requires intensive all-to-all communication due to the necessity of matrix transpose in three steps. This all-to-all communication is a limiting factor in improving the performance of FFT in its parallel implementations. In this paper, we present two parallel algorithms for implementing the 1-D FFT without all-to-all communication between processors, at the expense of increased inner-processor computation as compared to the conventional six-step FFT algorithm. Our analysis reveals the advantage of these two algorithms over the six-step FFT algorithm in parallel systems where the cost of inter-processor communication outweighs the cost of inner-processor computation. As a case study, we choose a 32-node Beowulf cluster with fast processors (running at 2 GHz) but relatively slow inter-processor communication (over a 100 Mbit/s switch). Simulation results on this cluster demonstrate that the proposed no-communication FFT algorithms can achieve a speedup ranging from 1.1 to 1.5 over the six-step FFT algorithm.

    References

    • P. N. Swarztrauber, Parallel Computing 5, 197 (1987). Crossref, ISIGoogle Scholar
    • P. N. Swarztrauber and S. W. Hammond, Parallel Computing 27, 847 (2001). Crossref, ISIGoogle Scholar
    • D. H. Bailey, Journal of Supercomputing 4, 23 (1990). Crossref, ISIGoogle Scholar
    • I. Gertner and M. Rofheart, IEEE Trans. Parallel and Distributed Systems 1, 377 (1990). CrossrefGoogle Scholar
    • I. Gertner, IEEE Trans. Acoust., Speech, and Signal Processing ASSP-26, 1036 (1988). Google Scholar
    • C. Luet al., Hybrid parallel FFT algorithm without interprocessor communication, Proc. of IEEE International Conference on ASSP pp. iii-281–iii-284. Google Scholar
    • G. Kechriotiset al., IEEE Trans. Signal Processing 43, 272 (1995). Crossref, ISIGoogle Scholar
    • F. Marino and E. Swartzlander, IEEE Trans. Computers 48, 951 (1999). Crossref, ISIGoogle Scholar
    • F. Marino, V. Piuri and E. Swartzlander, IEEE Trans. Signal Processing 47, 3179 (1999). Crossref, ISIGoogle Scholar
    • H.   Shan et al. , Message passing vs. shared address space on a cluster of SMPs , Proc. 15th International Parallel and Distributed Processing Symposium . Google Scholar
    • S. Johnsson and C.-T. Ho, Optimal all-to-all personalized communication with minimum span on boolean cubes, Proc. Distributed Memory Computing Conference pp. 299–304. Google Scholar
    • D. Scott, Efficient all-to-all communication patterns in hypercube and mesh topologies, Proc. Distributed Memory Computing Conference pp. 398–403. Google Scholar