CAN AGENT INTELLIGENCE BE USED TO ACHIEVE FAULT TOLERANT PARALLEL COMPUTING SYSTEMS?
Abstract
The work reported in this paper is motivated towards validating an alternative approach for fault tolerance over traditional methods like checkpointing that constrain efficacious fault tolerance. Can agent intelligence be used to achieve fault tolerant parallel computing systems? If so, "What agent capabilities are required for fault tolerance?", "What parallel computational tasks can benefit from such agent capabilities?" and "How can agent capabilities be implemented for fault tolerance?" need to be addressed. Cognitive capabilities essential for achieving fault tolerance through agents are considered. Parallel reduction algorithms are identified as a class of algorithms that can benefit from cognitive agent capabilities. The Message Passing Interface is utilized for implementing an intelligent agent based approach. Preliminary results obtained from the experiments validate the feasibility of an agent based approach for achieving fault tolerance in parallel computing systems.
References
- IEEE Transactions on Parallel and Distributed Systems 20(7), 997 (2009), DOI: 10.1109/TPDS.2008.172. Crossref, ISI, Google Scholar
A. D. Selvakumar , Design, Implementation and Performance of Fault-Tolerant Message Passing Interface (MPI), Proceedings of the 7th International Conference on High Performance Computing and Grid in Asia Pacific Region (2004) pp. 120–129. Google Scholar- IEEE Transactions on Parallel and Distributed Systems 19(12), 1628 (2008). Crossref, ISI, Google Scholar
- IEEE Transactions on Parallel and Distributed Systems 20(10), 1471 (2009). Crossref, ISI, Google Scholar
-
K. Mohammadi and H. Hamidi , A New Approach for Mobile Agent Fault-Tolerance and Reliability , Proceedings of the 1st IEEE and IFIP International Conference in Central Asia on Internet ( 2005 ) . Google Scholar C.-H. Yeh , The Robust Middleware Approach for Transparent and Systematic Fault Tolerance in Parallel and Distributed Systems, Proceedings of the International Conference on Parallel Processing (2003) pp. 61–68. Google ScholarA. S. Martins and R. A. L. Gonpalves , Implementing and Evaluating Automatic Checkpointing, Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (2007) pp. 1–8. Google ScholarJ. C. Mourino , Fault-Tolerant solutions for a MPI compute Intensive application, Proceedings of the 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (2007) pp. 246–253. Google ScholarT. Shwe and W. Aye , A Fault Tolerant Approach in Cluster Computing System, Proceedings of the 5th international Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (2008) pp. 149–152. Google Scholar- IEEE Transactions on Systems, Man and Cybernetics, Part C: Application and Reviews 700 (2006), DOI: 10.1109/TSMCC.2006.879381. ISI, Google Scholar
M. J. G. C. Mendes , B. M. S. Santos and J. Sa da Costa , Multi-agent Platform for Fault Tolerant Control Systems, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (2007) pp. 1321–1326. Google Scholar-
Ad. L. Almeida , Plan-Based Replication for Fault-Tolerant Multi-Agent Systems , Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium ( 2006 ) . Google Scholar Z. A. Khan , Decentralized Architecture for Fault Tolerant Multi Agent System, Proceedings of the 7th IEEE International Symposium on Autonomous Decentralized Systems (2005) pp. 167–174. Google Scholar- S. Summiya, K. Ijaz, U. Manzoor and A. A. Shahid, A Fault-Tolerant Infrastructure for Mobile Agents, Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (2006) . Google Scholar
-
M. K. Habib , K. Watanabe and K. Izumi , Biomimetics Robots from Bio-inspiration to Implementation , Proceedings of the 33rd Annual Conference of the IEEE Industrial Electronics Society ( 2007 ) . Google Scholar - Asian Journal of Control 10(4), 439 (2008), DOI: 10.1002/asjc.43. Crossref, ISI, Google Scholar
-
A. T. Lawniczak and B. N. Di Stefano , Computational Intelligence Based Architecture for Cognitive Agents , Proceedings of he International Conference on Computational Science ( 2010 ) . Google Scholar -
M. J. Quinn , Parallel Computing Theory and Practice ( McGraw-Hill Inc. , 1994 ) . Google Scholar M. V. O'Bryan , Compendium of Single Event Effects Results for Candidate Spacecraft Electronics for NASA, Proceedings of the IEEE Radiation Effects Data Workshop (2006) pp. 19–25. Google Scholar-
E. Johnson , M. J. Wirthlin and M. Caffrey , Single-Event Upset Simulation on an FPGA , Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms ( 2002 ) . Google Scholar B. Tang , P. Cui and Y. Chen , A Parallel Processing Kalman Filter for Spacecraft Vehicle Parameters Estimation, Proceedings of the IEEE International Symposium on Communications and Information Technology (2005) pp. 1476–1479. Google Scholar-
J. D. Sloan , High Performance Linux Cluster with OSCAR, Rocks, openMosix & MPI ( O'Reilly , 2005 ) . Google Scholar -
W. Gropp , E. Lusk and A. Skjullum , Using MPI-2: Advanced Features of the Message Passing Interface ( MIT Press , 1999 ) . Google Scholar - Center for Advanced Computing and Emerging Technologies (ACET) website: www.acet.reading.ac.uk . Google Scholar
- High Performance Computing at ACET website: http://hpc.acet.rdg.ac.uk/ . Google Scholar
- OpenMPI website: http://www.open-mpi.org/ . Google Scholar
E. Gabriel , Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings of the 11th European PVM/MPI Users' Group Meeting (2004) pp. 97–104. Google Scholar- MPI Tutorial: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html . Google Scholar


