World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

CAN AGENT INTELLIGENCE BE USED TO ACHIEVE FAULT TOLERANT PARALLEL COMPUTING SYSTEMS?

    The work reported in this paper is motivated towards validating an alternative approach for fault tolerance over traditional methods like checkpointing that constrain efficacious fault tolerance. Can agent intelligence be used to achieve fault tolerant parallel computing systems? If so, "What agent capabilities are required for fault tolerance?", "What parallel computational tasks can benefit from such agent capabilities?" and "How can agent capabilities be implemented for fault tolerance?" need to be addressed. Cognitive capabilities essential for achieving fault tolerance through agents are considered. Parallel reduction algorithms are identified as a class of algorithms that can benefit from cognitive agent capabilities. The Message Passing Interface is utilized for implementing an intelligent agent based approach. Preliminary results obtained from the experiments validate the feasibility of an agent based approach for achieving fault tolerance in parallel computing systems.

    References

    • J. P. Walters and V. Chaudhary, IEEE Transactions on Parallel and Distributed Systems 20(7), 997 (2009), DOI: 10.1109/TPDS.2008.172. Crossref, ISIGoogle Scholar
    • A. D. Selvakumaret al., Design, Implementation and Performance of Fault-Tolerant Message Passing Interface (MPI), Proceedings of the 7th International Conference on High Performance Computing and Grid in Asia Pacific Region (2004) pp. 120–129. Google Scholar
    • Z. Chen and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems 19(12), 1628 (2008). Crossref, ISIGoogle Scholar
    • X. Yanget al., IEEE Transactions on Parallel and Distributed Systems 20(10), 1471 (2009). Crossref, ISIGoogle Scholar
    • K.   Mohammadi and H.   Hamidi , A New Approach for Mobile Agent Fault-Tolerance and Reliability , Proceedings of the 1st IEEE and IFIP International Conference in Central Asia on Internet ( 2005 ) . Google Scholar
    • C.-H. Yeh, The Robust Middleware Approach for Transparent and Systematic Fault Tolerance in Parallel and Distributed Systems, Proceedings of the International Conference on Parallel Processing (2003) pp. 61–68. Google Scholar
    • A. S. Martins and R. A. L. Gonpalves, Implementing and Evaluating Automatic Checkpointing, Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (2007) pp. 1–8. Google Scholar
    • J. C. Mourinoet al., Fault-Tolerant solutions for a MPI compute Intensive application, Proceedings of the 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (2007) pp. 246–253. Google Scholar
    • T. Shwe and W. Aye, A Fault Tolerant Approach in Cluster Computing System, Proceedings of the 5th international Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (2008) pp. 149–152. Google Scholar
    • P. Tichyet al., IEEE Transactions on Systems, Man and Cybernetics, Part C: Application and Reviews 700 (2006), DOI: 10.1109/TSMCC.2006.879381. ISIGoogle Scholar
    • M. J. G. C. Mendes, B. M. S. Santos and J. Sa da Costa, Multi-agent Platform for Fault Tolerant Control Systems, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (2007) pp. 1321–1326. Google Scholar
    • Ad. L.   Almeida et al. , Plan-Based Replication for Fault-Tolerant Multi-Agent Systems , Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium ( 2006 ) . Google Scholar
    • Z. A. Khanet al., Decentralized Architecture for Fault Tolerant Multi Agent System, Proceedings of the 7th IEEE International Symposium on Autonomous Decentralized Systems (2005) pp. 167–174. Google Scholar
    • S. Summiya, K. Ijaz, U. Manzoor and A. A. Shahid, A Fault-Tolerant Infrastructure for Mobile Agents, Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (2006) . Google Scholar
    • M. K.   Habib , K.   Watanabe and K.   Izumi , Biomimetics Robots from Bio-inspiration to Implementation , Proceedings of the 33rd Annual Conference of the IEEE Industrial Electronics Society ( 2007 ) . Google Scholar
    • Y.-L. Jian, F.-L. Lian and H.-T. Lee, Asian Journal of Control 10(4), 439 (2008), DOI: 10.1002/asjc.43. Crossref, ISIGoogle Scholar
    • A. T.   Lawniczak and B. N.   Di Stefano , Computational Intelligence Based Architecture for Cognitive Agents , Proceedings of he International Conference on Computational Science ( 2010 ) . Google Scholar
    • M. J.   Quinn , Parallel Computing Theory and Practice ( McGraw-Hill Inc. , 1994 ) . Google Scholar
    • M. V. O'Bryanet al., Compendium of Single Event Effects Results for Candidate Spacecraft Electronics for NASA, Proceedings of the IEEE Radiation Effects Data Workshop (2006) pp. 19–25. Google Scholar
    • E.   Johnson , M. J.   Wirthlin and M.   Caffrey , Single-Event Upset Simulation on an FPGA , Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms ( 2002 ) . Google Scholar
    • B. Tang, P. Cui and Y. Chen, A Parallel Processing Kalman Filter for Spacecraft Vehicle Parameters Estimation, Proceedings of the IEEE International Symposium on Communications and Information Technology (2005) pp. 1476–1479. Google Scholar
    • J. D.   Sloan , High Performance Linux Cluster with OSCAR, Rocks, openMosix & MPI ( O'Reilly , 2005 ) . Google Scholar
    • W.   Gropp , E.   Lusk and A.   Skjullum , Using MPI-2: Advanced Features of the Message Passing Interface ( MIT Press , 1999 ) . Google Scholar
    • Center for Advanced Computing and Emerging Technologies (ACET) website: www.acet.reading.ac.uk . Google Scholar
    • High Performance Computing at ACET website: http://hpc.acet.rdg.ac.uk/ . Google Scholar
    • OpenMPI website: http://www.open-mpi.org/ . Google Scholar
    • E. Gabrielet al., Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Proceedings of the 11th European PVM/MPI Users' Group Meeting (2004) pp. 97–104. Google Scholar
    • MPI Tutorial: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html . Google Scholar