SOFTWARE SUPPORTS FOR EVENT PREEMPTIVE ROLLBACK IN OPTIMISTIC PARALLEL SIMULATION ON MYRINET CLUSTERS
Abstract
Optimistic synchronization protocols for parallel discrete event simulation employ rollback techniques to ensure causally consistent execution of simulation events. Although event preemptive rollback (i.e. rollback based on timely event execution interruption upon the arrival of a message revealing a causality inconsistency) is recognized as an approach for increasing the performance and tackling run-time anomalies of this type of synchronization, the lack of adequate functionalities at the level of general purpose communication layers typically prevents any effective implementation of event preemptive rollback operations. In this paper we present the design and implementation of a communication layer for Myrinet based clusters, which efficiently supports event preemptive rollback operations. Beyond standard low latency message delivery funcbionalities, this layer also embeds functionalities for allowing the overlying simulation application to efficiently track whether a message will actually produce causality inconsistency of the currently executed simulation event upon its receipt at the application level. Exploiting these functionalities, awareness of the inconsistency precedes the message receipt at the application level, thus allowing timely event execution interruption for activating rollback procedures. We also report experimental results demonstrating the effectiveness of our solution for a Personal Communication System (PCS) simulation application.
An earlier version of this article by the same authors appeared in the Proceedings of the 7th IEEE Symposium on Computers and Communications (ISCC'02).
References
H. Bauer and C. Sporrer , Reducing Rollback Overhead in Time Warp Based Distributed Simulation with Optimized Incremental State Saving, Proc. 26th Annual Simulation Symposium (IEEE Computer Society, 1993) pp. 12–20. Google ScholarA. Boukerche , Exploiting Model Independence for Parallel PCS Network Simulation, Proc. 13th Workshop on Parallel and Distributed Simulation (PADS'99) (IEEE Computer Society, 1999) pp. 166–173. Google ScholarC. D. Carothers , R. Fujimoto and P. England , Effect of Communication Overheads on Time Warp Performance: An Experimental Study, Proc. 8th Workshop on Parallel and Distributed Simulation (PADS'94) (IEEE Computer Society, 1994) pp. 118–125. Google ScholarC. D. Carothers , D. Bauer and S. Pearce , ROSS: a High Performance, Low Memory, Modular Time Warp System, Proc. 14th Workshop on Parallel and Distributed Simulation (PADS'00) (IEEE Computer Society, 2000) pp. 53–60. Google ScholarC. D. Carothers , R. Fujimoto and Y. B. Lin , A Case Study in Simulating PCS Networks Using Time Warp, Proc. 9th Workshop on Parallel and Distributed Simulation (PADS'95) (IEEE Computer Society, 1995) pp. 87–94. Google ScholarS. Das , GTW: A Time Warp System for Shared Memory Multiprocessors, Proc. 1994 Winter Simulation Conference (SCS, 1994) pp. 1332–1339. Google Scholar- Communications of ACM 33(10), 30 (1990), DOI: 10.1145/84537.84545. Web of Science, Google Scholar
- A. Gafni, "Space Management and Cancellation Mechanisms for Time Warp", Tech. Rep. TR-85-341, University of Southern California, Los Angeles (Ca,USA) . Google Scholar
- RAMIX, "Intelligent Ethernet Interface Solutions". Available at http://www.ramix.com . Google Scholar
- ACM Transaction on Programming Languages and Systems 7(3), 404 (1985), DOI: 10.1145/3916.3988. Web of Science, Google Scholar
- IEEE Transactions on Wireless Communications 1(1), 46 (2002), DOI: 10.1109/7693.975444. Web of Science, Google Scholar
- Myricom, "Myrinet Overview". Available at http://www.myri.com/myrinet/overview/index.html . Google Scholar
- MYRICOM, "LANai 4", Draft, February 1999 . Google Scholar
- MYRICOM, "LANai 7", Draft, June 1999. Available at http://www.myri.com/vlsi/LANai7.pdf . Google Scholar
- MYRICOM, "LANA 9", May 2001. Available at http://www.myri.com/vlsi/LANai9.pdf . Google Scholar
- MYRICOM, "GM 1.6.4 API Performance with PCI64B and PCI64C Myrinet/PCI Interfaces", April 2003. Available at http://www.myri.com/myrinet/performance/index.html . Google Scholar
D. M. Nicol and X. Liu , The Dark Side of Risk (what your mother never told you about Time Warp), Proc. 11th Workshop on Parallel and Distributed Simulation (PADS'97) (IEEE Computer Society, 1997) pp. 188–195. Google Scholar-
S. Pakin , M. Lauria and A. Chen , High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , Proc. 9th ACM Int. Conference on Supercomputing ( ACM Press , 1995 ) . Google Scholar - ACM Transactions on Modeling and Computer Simulation 4(3), 223 (1994). Google Scholar
- Cluster Computing 5(4), 353 (2002). Google Scholar
S. Srinivasan , Implementation of Reductions in Support of PDES, Proc. 12th Workshop on Parallel and Distributed Simulation (PADS'98) (IEEE Computer Society, 1998) pp. 116–123. Google Scholar- ACM Transactions on Modeling and Computer Simulation 8(2), 103 (1998), DOI: 10.1145/280265.280267. Google Scholar
J. Steinman , Incremental State Saving in SPEEDES Using C Plus Plus, Proc. 1993 Winter Simulation Conference (SCS, 1993) pp. 687–696. Google ScholarH. Tezuka , PM: An Operating System Coordinated High Performance Communication Library, Proc. International Conference and Exhibition on High-Performance Computing and Networking (1997) pp. 708–717. Google ScholarB. W. Unger , External State Management System for Optimistic Parallel Simulation, Proc. 1993 Winter Simulation Conference (SCS, 1993) pp. 750–755. Google Scholar