Implementing Gene Expression Programming in the Parallel Environment for Big Datasets’ Classification
Abstract
The paper investigates a Gene Expression Programming (GEP)-based ensemble classifier constructed using the stacked generalization concept. The classifier has been implemented with a view to enable parallel processing with the use of Spark and SWIM — an open source genetic programming library. The classifier has been validated in computational experiments carried out on benchmark datasets. Also, it has been inbvestigated how the results are influenced by some settings. The paper is an extension of a previous paper of the authors.
References
- 1. , Gene expression programming: A new adaptive algorithm for solving problems, Complex Syst. 13(2) (2001) 87–129. Google Scholar
- 2. , Mapreduce-based parallel gep algorithm for efficient function mining in big data applications, Concurrency Comput. Pract. Exp. 30(23) (2018) e4379. Crossref, Google Scholar
- 3. , Gep-induced expression trees as weak classifiers, in Proc. 8th Industrial Conf. Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, ICDM ’08 (Springer-Verlag, Berlin, Heidelberg, 2008), pp. 129–141. Crossref, Google Scholar
- 4. , A family of gep-induced ensemble classifiers, in Proc. 1st Int. Conf. Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems, ICCCI ’09 (Springer-Verlag, Berlin, Heidelberg, 2009), pp. 641–652. Crossref, Google Scholar
- 5. , Combining expression trees, in 2013 IEEE Int. Conf. Cybernetics (CYBCO) (IEEE, New York, 2013), pp. 80–85. Crossref, Google Scholar
- 6. , Gene expression programming ensemble for classifying big datasets, in 9th Int. Conf. Computational Collective Intelligence ICCCI 2017, Nicosia, Cyprus, 27–29 September 2017, Proc. Part II, eds. N. T. Nguyen, G. A. Papadopoulos, P. Jȩdrzejowicz, B. Trawiński and G. Vossen (Springer International Publishing, Cham, 2017), pp. 3–12. Crossref, Google Scholar
- 7. , Parallel gep ensemble for classifying big datasets in Int. Conf. Computational Collective Intelligence (Springer, Cham, 2018), pp. 234–242. Crossref, Google Scholar
- 8. , Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Springer, Berlin, Heidelberg, 2006). Crossref, Google Scholar
- 9. , Evolving accurate and compact classification rules with gene expression programming, IEEE Trans. Evol. Comput. 7 (2003) 519–531. Crossref, Google Scholar
- 10. , Finding compact classification rules with parsimonious gene expression programming, in 2005 Int. Conf. Neural Networks and Brain (IEEE, New York, 2005), pp. 702–705. Crossref, Google Scholar
- 11. ,
Gepclass: A classification rule discovery tool using gene expression programming , in Advanced Data Mining and Applications, eds. X. Li, O. R. Zaïane and Z. Li (Springer, Berlin, Heidelberg, 2006), pp. 871–880. Crossref, Google Scholar - 12. ,
Distance guided classification with gene expression programming , in Advanced Data Mining and Applications, eds. X. Li, O. R. Zaïane and Z. Li (Springer, Berlin, 2006), pp. 239–246. Crossref, Google Scholar - 13. ,
An improved gene expression programming for fuzzy classification , in Advances in Computation and Intelligence, eds. L. Kang, Z. Cai, X. Yan and Y. Liu (Springer, Berlin, Heidelberg, 2008), pp. 520–529. Crossref, Google Scholar - 14. , An algorithm evaluation for discovering classification rules with gene expression programming, Int. J. Comput. Intell. Syst. 9(2) (2016) 263–280. Crossref, Google Scholar
- 15. , Parallelizing gene expression programming algorithm in enabling large-scale classification, Sci. Program. 2017 (2017) 10pp. Google Scholar
- 16. , Evolving classifier ensemble with gene expression programming, in Proc. Third Int. Conf. Natural Computation — Volume 03, ICNC ’07 (IEEE Computer Society, Washington, DC, 2007), pp. 546–550. Crossref, Google Scholar
- 17. , An attribute-oriented ensemble classifier based on niche gene expression programming, in Third Int. Conf. Natural Computation (ICNC 2007) (IEEE, New York, 2007), pp. 525–529. Crossref, Google Scholar
- 18. Swim library http://github.com/kkrawiec/swim, Accessed on 30 September 2017. Google Scholar
- 19. Apache spark, http://spark.apache.org/, Accessed on 30 September 2017. Google Scholar
- 20. , Behavioral Program Synthesis with Genetic Programming,
Studies in Computational Intelligence , Vol. 618 (Springer International Publishing, Berlin, 2016), 172pp. http://www.cs.put.poznan.pl/kkrawiec/bps Crossref, Google Scholar - 21. , A windowing strategy for distributed data mining optimized through gpus, Pattern Recognit. Lett. 93(11) (2016) 29–30. Google Scholar
- 22. , The comparison of machine learning methods to achieve most cost-effective prediction for credit card default, J. Manag. Sci. Bus. Intell. 2(8) (2017) 36–41. Google Scholar
- 23. , Using feature-based models with complexity penalization for selecting features, J. Signal Process. Syst. 90 (2018) 201–210. Crossref, Google Scholar
- 24. , Comparative analysis of algorithms in supervised classification: A case study of bank notes dataset, Int. J. Comput. Trends Technol. 17(1) (2014) 39–43. Crossref, Google Scholar
- 25. , A comparative study of artificial neural networks and logistic regression for classification of marketing campaign results, Math. Comput. Appl. 18 (2013) 392–398. Google Scholar
- 26. , Evaluation of machine learning algorithms for intrusion detection system, in 2017 IEEE 15th Int. Symp. Intelligent Systems and Informatics (SISY) (IEEE, New York, 2017), pp. 000277–000282. Crossref, Google Scholar
- 27. K. Bache and M. Lichman, UCI machine learning repository, UCI Machine Learning Repository University of California, Irvine, School of Information and Computer Sciences (2013). Google Scholar
- 28. , OpenML: networked science in machine learning, SIGKDD Explorations, Vol. 15, No. 3 (ACM, New York, NY, USA, 2017), pp. 49–60. Google Scholar