MASSIVELY PARALLEL I/O FOR PARTITIONED SOLVER SYSTEMS
Abstract
This paper investigates I/O approaches for massively parallel partitioned solver systems. Typically, such systems have synchronized "loops" and write data in a well defined block I/O format consisting of a header and data portion. Our target use for such a parallel I/O subsystem is checkpoint-restart where writing is by far the most common operation and reading typically only happens during either initialization or during a restart operation because of a system failure. We compare four parallel I/O strategies: POSIX File Per Processor (1PFPP), "Poor-Man's" Parallel I/O (PMPIO), a synchronized parallel I/O (syncIO), and a "reduced blocking" strategy (rbIO). Performance tests executed on the Blue Gene/P at Argonne National Laboratory using real CFD solver data from PHASTA (an unstructured grid finite element Navier-Stokes solver) show that the syncIO strategy can achieve a read bandwidth of 47.4 GB/sec and a write bandwidth of 27.5 GB/sec using 128K processors. The "reduced-blocking" rbIO strategy achieves an actual writing performance of 17.8 GB/sec and the perceived writing performance is 166 TB/sec on Blue Gene/P using 128K processors.
References
- Blue Waters PACT Team Collaborations , http://www.ncsa.illinois.edu/BlueWaters/prac.html . Google Scholar
-
J. Borrill , Investigation of leading HPC I/O performance using a scientific-application derived benchmark , Proceedings of the ACM/IEEE Supercomputing Conference (SC) . Google Scholar - Parallel Computing 35, 358 (2009), DOI: 10.1016/j.parco.2009.02.002. Crossref, ISI, Google Scholar
-
P. Carns , 24/7 characterization of petascale I/O workloads , Proceedings of 2009 Workshop on Interfaces and Architectures for Scientific Data Storage . Google Scholar - Journal of Physics: Conference Series 180, (2009), DOI: 10.1088/1742-6596/180/1/012048. Google Scholar
- Cluster XFS , http://www.sgi.com/products/storage/software/cxfs.html . Google Scholar
- Computational Center for Nanotechnology Innovations (CCNI), Rensselaer Technology Park, North Greenbush, NY , http://www.rpi.edu/research/ccni . Google Scholar
-
P. M. Dickens and J. Logan , Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , Proceedings of the IEEE Conference on High-Performance Distributed Computing (HPDC) . Google Scholar - Physics of Fluids 20, (2008), DOI: 10.1063/1.2907227. Google Scholar
M. R. Fahey , J. M. Larkin and J. Adams , I/O performance on a Massively parallel Cray XT3/XT4, Proceedings of the IEEE International Symposium on Parallel and Distributed Processing(IPDPS) pp. 1–12. Google Scholar-
J. Fu , Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems , Proceeding of the IEEE Large-Scale Parallel Processing Workshop (LLSP), at the IEEE International Symposium on Parallel and Distributed Processing . Google Scholar - General Parallel File System , http://www-03.ibm.com/systems/clusters/software/gpfs/index.html . Google Scholar
- GROMACS , http://www.gromacs.org . Google Scholar
- LAMMPS , http://www.cs.sandia.gov/sjplimp/lammps.html . Google Scholar
-
J. M. Larkin and M. R. Fahey , Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 , Cray Users Group Meeting (CUG) . Google Scholar - HDF5 , http://www.hdfgroup.org/HDF5 . Google Scholar
- IOR Benchmark , https://asc.llnl.gov/sequoia/benchmarks/\#ior . Google Scholar
-
S. Lang , I/O performance challenges at leadership scale , Proceedings of the ACM/IEEE Supercomputing Conference (SC) . Google Scholar -
J. F. Lofstead , Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments . Google Scholar -
J. Lofstead , Adaptable, metadata rich IO methods for portable high performance IO , Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) . Google Scholar - Message Passing Interface , http://www.mcs.anl.gov/research/projects/mpi . Google Scholar
-
A. Nisar , W. Liao and A. Choudhary , Scaling parallel I/O performance through I/O delegate and caching system , Proceedings of the 2008 ACM/IEEE conference on Supercomputing . Google Scholar - Parallel Virtual File System 2 , http://www.pvfs.org . Google Scholar
- R. Ross, Personal communication via email, December 12, 2009 . Google Scholar
-
O. Sahni , Scalable implicit finite element solver for massively parallel processing with demonstration to 160k cores , Proceedings of the ACM/IEEE Supercomputing Conference (SC) . Google Scholar - Scientific Programming 17, 261 (2009). Crossref, ISI, Google Scholar
-
S. Saini , Parallel I/O performance characterization of Columbia and NEC SX-8 superclusters , Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS) . Google Scholar -
F. B. Schmuck and R. L. Haskin , GPFS: A shared-disk file system for large computing clusters , Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) . Google Scholar -
S. Seelam , Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene systems , Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing IPDPS . Google Scholar -
H. Shan and J. Shalf , Using IOR to analyze the I/O performance for HPC platforms , Cray Users Group Meeting (CUG) . Google Scholar -
H. Shan , K. Antypas and J. Shalf , Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , Proceedings of the ACM Supercomputing Conference (SC) . Google Scholar - Silo: A mesh and field I/O library and scientific database , http://wci.llnl.gov/codes/silo . Google Scholar
- R. Thakur, E. Lusk and W. Gropp, Users guide for ROMIO: A high-performance, portable MPI-IO implementation, 2004, Technical Report Technical Memorandum ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory . Google Scholar
-
H. Yu , High performance file I/O for the Blue Gene/L supercomputer , Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA-12) . Google Scholar -
W. Yu , J. S. Vetter and S. Oral , Performance characterization and optimization of parallel I/O on the Cray XT , Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS) . Google Scholar


