A PARALLEL PROBABILISTIC SYSTEM-LEVEL FAULT DIAGNOSIS APPROACH FOR LARGE MULTIPROCESSOR SYSTEMS
Abstract
In this paper, we present a system-level fault identification algorithm, using a parallel genetic algorithm, for diagnosing faulty nodes in large heterogeneous systems. The algorithm is based on a probabilistic model where individual node fails with an a priori probability p. The assumptions concerning test outcomes are the same as in the PMC model, that is, fault-free testers always give correct test outcomes and faulty testers are totally unpredictable. The parallel diagnosis algorithm was implemented and simulated on randomly generated large systems. The proposed parallelization is intended to speed up the performance of the evolutionary diagnosis approach, hence reducing the computation time by evolving various sub-populations in parallel. Simulation results are provided showing that the parallel diagnosis did improve the efficiency of the evolutionary diagnosis approach, in that it allowed faster diagnosis of faulty situations, making it a viable alternative to existing techniques of diagnosis. Moreover, the evolutionary approach still provide good results even when extreme non-diagnosable faulty situations are considered.
References
- ACM Computing Surveys 25(2), 171 (1993), DOI: 10.1145/152610.152612. Crossref, ISI, Google Scholar
- IEEE Trans. on Computers C 25, 585 (1976). ISI, Google Scholar
- D. M. Blough. Fault Detection and Diagnosis in Multiprocessor Systems. PhD thesis, The Johns Hopkins University, Baltimore, Md., 1988 . Google Scholar
- IEEE Trans. on Computers 42(2), 205 (1993), DOI: 10.1109/12.204793. Crossref, ISI, Google Scholar
- IEEE Trans. on Computers 41(9), 1126 (1992), DOI: 10.1109/12.165394. Crossref, ISI, Google Scholar
M. Blount , Probabilistic Treatment of Diagnosis in Digital Systems, Proc. 7th Int. Symp. on Fault-Tolerant Comput. (IEEE Comput. Sot. Publ., 1977) pp. 72–77. Google Scholar-
D. A. Coley , An Introduction to Genetic Algorithms for Scientists and Engineers ( World Scientific , Singapore; River Edge, NJ ) . Google Scholar - IEEE Trans. on Computers C 35, (1986). Google Scholar
- L. N. de Castro and F. J. Von Zuben. Artificial Immune Systems: Part I - Basic Theory and Applications. Technical Report Technical Report RT DCA 01/99, 1999 . Google Scholar
M. Elhadef , Fault Identification in Probabilistically Diagnosable Heterogeneous Systems: A Genetic Approach, Proc. of the 5th Int. Conf. on Principles of Distributed Systems (2001) pp. 55–70. Google ScholarM. Elhadef and B. Ayeb , An Evolutionary Algorithm for Identifying Faults in t-Diagnosable Systems, Proc. of the 19th IEEE Symp. on Reliable Distributed Systems (SRDS-2000) (2000) pp. 74–83. Google ScholarM. Elhadef and D. A. Coley , Adaptive Mutation for Semi-Separable Problems, Proc. of the Genetic and Evolutionary Computation Conference (GECCO-2001) (2001) pp. 306–312. Google Scholar-
D. B. Fogel , Evolutionary Computation. Towards a New Philosophy of Machine Intelligence , 2nd edn. ( IEEE Press , New York , 2000 ) . Google Scholar -
A. Geist , PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Network Parallel Computing ,Scientific and Engineering Computation ( The MIT Press , 1994 ) . Crossref, Google Scholar -
V. S. Gordon and D. Whitley , Serial and Parallel Genetic Algorithms as Function Optimizers , Proc. of the 5th Int. Conf. on Genetic Algorithms ( 1993 ) . Google Scholar - Inform. Contr. 29, 141 (1975). Crossref, Google Scholar
-
Zdenek Konfrt , Parallel Genetic Algorithms: Advances, Computing Trends, Applications and Perspectives , Proc. of 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop ( 2004 ) . Google Scholar S. Kreutzer and S. L. Hakimi , Adaptive Fault Identification in Two New Diagnostic Models, Proc. 21st Allerton Conf. on Commun., Cont. and Comput. (Univ. of Illinois, Urbana, Ill, 1983) pp. 353–362. Google Scholar- S. Lee. Probabilistic Multiprocessor and Multicomputer Diagnosis. PhD thesis, The Univ. of Michigan, Ann Arbor, Mich., 1990 . Google Scholar
- ACM Computing Surveys 26(1), 121 (1994), DOI: 10.1145/174666.174669. Crossref, ISI, Google Scholar
- IEEE Trans. on Computers C 25, 228 (1976). ISI, Google Scholar
- , New Ideas in Optimization, eds.
D. Corne , M. Dorigo and F. Glover (McGraw-Hill, 1999) pp. 219–234. Google Scholar - IEEE Trans. on Electron. Comput. 40(11), 1271 (1991). Crossref, ISI, Google Scholar
- IEEE Trans. on Computers 47(3), 298 (1998). ISI, Google Scholar
- IEEE Trans. on Electron. Comput. 16(6), (1967). Google Scholar
- IEEE Trans. on Computers 41(11), 1386 (1992), DOI: 10.1109/12.177309. Crossref, ISI, Google Scholar
G. Sullivan , A Polynomial Time Algorithm for Fault Diagnosability, Proc. 25th Ann. Symp. on Foundations of Comput. Sci. (IEEE Comput. Soc. Publ., 1984) pp. 148–156. Google Scholar- Matthew Wall. A C++ Library of Genetic Algorithm Components. http://lancet.mit.edu/ga/ . Google Scholar


