World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

MASSIVELY PARALLEL I/O FOR PARTITIONED SOLVER SYSTEMS

    This paper investigates I/O approaches for massively parallel partitioned solver systems. Typically, such systems have synchronized "loops" and write data in a well defined block I/O format consisting of a header and data portion. Our target use for such a parallel I/O subsystem is checkpoint-restart where writing is by far the most common operation and reading typically only happens during either initialization or during a restart operation because of a system failure. We compare four parallel I/O strategies: POSIX File Per Processor (1PFPP), "Poor-Man's" Parallel I/O (PMPIO), a synchronized parallel I/O (syncIO), and a "reduced blocking" strategy (rbIO). Performance tests executed on the Blue Gene/P at Argonne National Laboratory using real CFD solver data from PHASTA (an unstructured grid finite element Navier-Stokes solver) show that the syncIO strategy can achieve a read bandwidth of 47.4 GB/sec and a write bandwidth of 27.5 GB/sec using 128K processors. The "reduced-blocking" rbIO strategy achieves an actual writing performance of 17.8 GB/sec and the perceived writing performance is 166 TB/sec on Blue Gene/P using 128K processors.

    References

    • Blue Waters PACT Team Collaborations , http://www.ncsa.illinois.edu/BlueWaters/prac.html . Google Scholar
    • J.   Borrill et al. , Investigation of leading HPC I/O performance using a scientific-application derived benchmark , Proceedings of the ACM/IEEE Supercomputing Conference (SC) . Google Scholar
    • J. Borrillet al., Parallel Computing 35, 358 (2009), DOI: 10.1016/j.parco.2009.02.002. Crossref, ISIGoogle Scholar
    • P.   Carns et al. , 24/7 characterization of petascale I/O workloads , Proceedings of 2009 Workshop on Interfaces and Architectures for Scientific Data Storage . Google Scholar
    • A. Choudharyet al., Journal of Physics: Conference Series 180, (2009), DOI: 10.1088/1742-6596/180/1/012048. Google Scholar
    • Cluster XFS , http://www.sgi.com/products/storage/software/cxfs.html . Google Scholar
    • Computational Center for Nanotechnology Innovations (CCNI), Rensselaer Technology Park, North Greenbush, NY , http://www.rpi.edu/research/ccni . Google Scholar
    • P. M.   Dickens and J.   Logan , Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , Proceedings of the IEEE Conference on High-Performance Distributed Computing (HPDC) . Google Scholar
    • D. A. Donzis, P. Yeung and K. Sreenivasan, Physics of Fluids 20, (2008), DOI: 10.1063/1.2907227. Google Scholar
    • M. R. Fahey, J. M. Larkin and J. Adams, I/O performance on a Massively parallel Cray XT3/XT4, Proceedings of the IEEE International Symposium on Parallel and Distributed Processing(IPDPS) pp. 1–12. Google Scholar
    • J.   Fu et al. , Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems , Proceeding of the IEEE Large-Scale Parallel Processing Workshop (LLSP), at the IEEE International Symposium on Parallel and Distributed Processing . Google Scholar
    • General Parallel File System , http://www-03.ibm.com/systems/clusters/software/gpfs/index.html . Google Scholar
    • GROMACS , http://www.gromacs.org . Google Scholar
    • LAMMPS , http://www.cs.sandia.gov/sjplimp/lammps.html . Google Scholar
    • J. M.   Larkin and M. R.   Fahey , Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 , Cray Users Group Meeting (CUG) . Google Scholar
    • HDF5 , http://www.hdfgroup.org/HDF5 . Google Scholar
    • IOR Benchmark , https://asc.llnl.gov/sequoia/benchmarks/\#ior . Google Scholar
    • S.   Lang et al. , I/O performance challenges at leadership scale , Proceedings of the ACM/IEEE Supercomputing Conference (SC) . Google Scholar
    • J. F.   Lofstead et al. , Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments . Google Scholar
    • J.   Lofstead et al. , Adaptable, metadata rich IO methods for portable high performance IO , Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) . Google Scholar
    • Message Passing Interface , http://www.mcs.anl.gov/research/projects/mpi . Google Scholar
    • A.   Nisar , W.   Liao and A.   Choudhary , Scaling parallel I/O performance through I/O delegate and caching system , Proceedings of the 2008 ACM/IEEE conference on Supercomputing . Google Scholar
    • Parallel Virtual File System 2 , http://www.pvfs.org . Google Scholar
    • R. Ross, Personal communication via email, December 12, 2009 . Google Scholar
    • O.   Sahni et al. , Scalable implicit finite element solver for massively parallel processing with demonstration to 160k cores , Proceedings of the ACM/IEEE Supercomputing Conference (SC) . Google Scholar
    • O. Sahniet al., Scientific Programming 17, 261 (2009). Crossref, ISIGoogle Scholar
    • S.   Saini et al. , Parallel I/O performance characterization of Columbia and NEC SX-8 superclusters , Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS) . Google Scholar
    • F. B.   Schmuck and R. L.   Haskin , GPFS: A shared-disk file system for large computing clusters , Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) . Google Scholar
    • S.   Seelam et al. , Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene systems , Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing IPDPS . Google Scholar
    • H.   Shan and J.   Shalf , Using IOR to analyze the I/O performance for HPC platforms , Cray Users Group Meeting (CUG) . Google Scholar
    • H.   Shan , K.   Antypas and J.   Shalf , Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , Proceedings of the ACM Supercomputing Conference (SC) . Google Scholar
    • Silo: A mesh and field I/O library and scientific database , http://wci.llnl.gov/codes/silo . Google Scholar
    • R. Thakur, E. Lusk and W. Gropp, Users guide for ROMIO: A high-performance, portable MPI-IO implementation, 2004, Technical Report Technical Memorandum ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory . Google Scholar
    • H.   Yu et al. , High performance file I/O for the Blue Gene/L supercomputer , Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA-12) . Google Scholar
    • W.   Yu , J. S.   Vetter and S.   Oral , Performance characterization and optimization of parallel I/O on the Cray XT , Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS) . Google Scholar