Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks
Abstract
One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties — weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.
References
- 1. , A survey on concept drift adaptation, ACM Comput. Surv. 46(4) (2014) 44. Crossref, ISI, Google Scholar
- 2. , Sliding window-based frequent pattern mining over data streams, Inf. Sci. 179(22) (2009) 3843–3865. Crossref, ISI, Google Scholar
- 3. , Sliding window-based weighted erasable stream pattern mining for stream data applications, Future Gener. Comput. Syst. 59 (2016) 1–20. Crossref, ISI, Google Scholar
- 4. , Adaptive concept drift detection, Stat. Anal. Data Min. 2(5–6) (2009) 311–327. Crossref, Google Scholar
- 5. , Learning with drift detection, in Brazilian Symposium on Artificial Intelligence (Springer, Berlin, 2004) pp. 286–295. Google Scholar
- 6. , How to adjust an ensemble size in stream data mining? Inf. Sci. 381 (2017) 46–54. Crossref, ISI, Google Scholar
- 7. , Learning concept-drifting data streams with random ensemble decision trees, Neurocomputing 166 (2015) 68–83. Crossref, ISI, Google Scholar
- 8. , De2: Dynamic ensemble of ensembles for learning nonstationary data, Neurocomputing 165 (2015) 14–22. Crossref, ISI, Google Scholar
- 9. , Learning in nonstationary environments: A survey, IEEE Comput. Intell. Mag. 10(4) (2015) 12–25. Crossref, ISI, Google Scholar
- 10. , Anomaly detection in streaming environmental sensor data: A data-driven modeling approach, Environ. Model. Softw. 25(9) (2010) 1014–1022. Crossref, ISI, Google Scholar
- 11. , Data stream clustering: A survey, ACM Comput. Surv. 46(1) (2013) 13. Crossref, ISI, Google Scholar
- 12. , Large Sample Methods in Statistics: An Introduction with Applications (CRC Press, USA, 1994). Google Scholar
- 13. , A general regression neural network, IEEE Trans. Neural Netw. 2(6) (1991) 568–576. Crossref, Medline, ISI, Google Scholar
- 14. , Probabilistic neural networks, Neural Netw. 3(1) (1990) 109–118. Crossref, ISI, Google Scholar
- 15. , Probabilistic neural networks and the polynomial adaline as complementary techniques for classification, IEEE Trans. Neural Netw. 1(1) (1990) 111–121. Crossref, Medline, Google Scholar
- 16. , A Distribution-free Theory of Nonparametric Regression (Springer Science & Business Media, 2006). Google Scholar
- 17. , Applied Nonparametric Regression (Cambridge University Press, Cambridge, 1990). Crossref, Google Scholar
- 18. , The parzen kernel approach to learning in non-stationary environment, in 2014 Int. Joint Conf., Neural Networks (IJCNN), (IEEE, Beijing, China, 2014), pp. 3319–3323. Google Scholar
- 19. , Density Estimation for Statistics and Data Analysis (CRC press, 1986). Crossref, Google Scholar
- 20. , Kernel Smoothing (CRC Press, USA, 1994). Crossref, Google Scholar
- 21. , A probabilistic neural network for earthquake magnitude prediction, Neural Netw. 22(7) (2009) 1018–1024. Crossref, Medline, ISI, Google Scholar
- 22. , Probabilistic neural networks for diagnosis of Alzheimer’s disease using conventional and wavelet coherence, J. Neurosci. Methods 197(1) (2011) 165–170. Crossref, Medline, ISI, Google Scholar
- 23. , Enhanced probabilistic neural network with local decision circles: A robust classifier, Integr. Comput.-Aided Eng. 17(3) (2010) 197–210. Crossref, ISI, Google Scholar
- 24. , Automated diagnosis of brain tumours astrocytomas using probabilistic neural network clustering and support vector machines, Int. J. Neural Syst. 15(01n02) (2005) 1–11. Link, Google Scholar
- 25. , Computer-aided diagnosis of Parkinsons disease using enhanced probabilistic neural network, J. Med. Syst. 39(11) (2015) 179. Crossref, Medline, ISI, Google Scholar
- 26. , Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network, Pattern Recognit. Lett. 28(10) (2007) 1142–1150. Crossref, ISI, Google Scholar
- 27. , Generalized regression neural network trained preprocessing of frequency domain correlation filter for improved face recognition and its optical implementation, Opt. Laser Technol. 45 (2013) 217–227. Crossref, ISI, Google Scholar
- 28. , Field-scale application of three types of neural networks to predict ground-water levels1, J. Am. Water Resour. Assoc. 43(5) (2007) 1245–1256. Crossref, ISI, Google Scholar
- 29. , Data Streams: Models and Algorithms (Advances in Database Systems) (Springer-Verlag, New York, Secaucus, NJ, 2006). Google Scholar
- 30. , Hierarchical change-detection tests, IEEE Trans. Neural Netw. Learn. Syst. 28 (2017) 246–258. Crossref, Medline, ISI, Google Scholar
- 31. , Just-in-time classifiers for recurrent concepts, IEEE Trans. Neural Netw. Learn. Syst. 24(4) (2013) 620–634. Crossref, Medline, ISI, Google Scholar
- 32. , Just-in-time adaptive classifiers — Part I: Detecting nonstationary changes, IEEE Trans. Neural Netw. 19(7) (2008) 1145–1153. Crossref, Google Scholar
- 33. , Moa: Massive online analysis, J. Mach. Learn. Res. 11 (2010) 1601–1604. ISI, Google Scholar
- 34. , New splitting criteria for decision trees in stationary data streams, IEEE Trans. Neural Netw. Learn. Syst. PP(99) (2017) 1–14. Crossref, Google Scholar
- 35. , A new method for data stream mining based on the misclassification error, IEEE Trans. Neural Netw. Learn. Syst. 26(5) (2015) 1048–1059. Crossref, Medline, ISI, Google Scholar
- 36. , The CART decision tree for mining data streams, Inf. Sci. 266 (2014) 1–15. Crossref, ISI, Google Scholar
- 37. , Decision trees for mining data streams based on the Gaussian approximation, IEEE Trans. Knowl. Data Eng. 26(1) (2014) 108–119. Crossref, ISI, Google Scholar
- 38. , Decision trees for mining data streams based on the McDiarmid’s bound, IEEE Trans. Knowl. Data Eng. 25(6) (2013) 1272–1279. Crossref, ISI, Google Scholar
- 39. , Efficient gaussian decision tree method for concept drift data stream, in 2015 3rd Int. Conf., Signal Processing, Communication and Networking (ICSCN), (IEEE, Chennai, India, 2015) pp. 1–5. Google Scholar
- 40. , Active learning with drifting streaming data, IEEE Trans. Neural Netw. Learn. Syst. 25(1) (2014) 27–39. Crossref, Medline, ISI, Google Scholar
- 41. , The time-dependent mean and variance of the non-stationary Markovian infinite server system, J. Math. Stat. 6 (2010) 68–71. Crossref, Google Scholar
- 42. , Impulse response and forecast error variance asymptotics in nonstationary VARs, J. Econ. 83(1) (1998) 21–56. Crossref, ISI, Google Scholar
- 43. , Modelling non-stationary variance in EEG time series by state space garch model, Comput. Biol. Med. 36(12) (2006) 1327–1335. Crossref, Medline, ISI, Google Scholar
- 44. , Kernel Methods for Pattern Analysis (Cambridge university press, Cambridge, 2004). Crossref, Google Scholar
- 45. , On estimating regression, Theory Probab. Appl. 9(1) (1964) 141–142. Crossref, Google Scholar
- 46. , Smooth regression analysis, Sankhyā: Indian J. Stat. A 26(4) (1964) 359–372. Google Scholar
- 47. , Asymptotycznie Optymalne Algorytmy Rozpoznawania i Identyfikacji w Warunkach Probabilistycznych, (Prace Naukowe Instytutu Cybernetyki Technicznej, Wroclaw, Poland, 1974). Google Scholar
- 48. , Nonparametric sequential estimation of a multiple regression function, Bull. Math. Stat. 17(1/2) (1976) 63–75. Google Scholar
- 49. , Necessary and sufficient consistency conditions for a recursive kernel regression estimate, J. Multivariate Anal. 23(1) (1987) 67–76. Crossref, ISI, Google Scholar
- 50. , Almost everywhere convergence of a recursive regression function estimate and classification, IEEE Trans. Inf. Theory 30(1) (1984) 91–93. Crossref, ISI, Google Scholar
- 51. , Nonparametric System Identification (Cambridge University Press, Cambridge, 2008). Crossref, Google Scholar
- 52. , Generalized regression neural networks in time-varying environment, IEEE Trans. Neural Netw. 15 (2004) 576–596. Crossref, Medline, Google Scholar
- 53. , Asymptotically optimal discriminant functions for pattern classification, IEEE Trans. Inf. Theory 15(2) (1969) 258–265. Crossref, ISI, Google Scholar
- 54. , On estimation of probability density function and mode, Ann. Math. Stat. 33 (1962) 1065–1076. Crossref, ISI, Google Scholar
- 55. , Estimation of a multivariate density, Ann. Inst. Stat. Math. 18(1) (1966) 179–189. Crossref, Google Scholar
- 56. , A dynamic stochastic approximation method, Ann. Math. Stat. 36(6) (1965) 1695–1702. Crossref, Google Scholar
- 57. , Some generalizations of dynamic stochastic approximation processes, Ann. Stat. 2 (1974) 1042–1048. Crossref, ISI, Google Scholar
- 58. , On Robbins–Monro stochastic approximation method with time varing observations, Bull. Math. Stat. 16 (1974) 73–91. Google Scholar
- 59. , Non-parametric estimation of a multivariate probability density, Theory Probab. Appl. 14(1) (1969) 153–158. Crossref, Google Scholar
- 60. , A reliable data-based bandwidth selection method for kernel density estimation, J. R. Stat. Soc., Series B Methodol. 53(3) (1991) 683–690. Google Scholar
- 61. , An alternative method of cross-validation for the smoothing of density estimates, Biometrika 71(2) (1984) 353–360. Crossref, ISI, Google Scholar
- 62. , Smoothed cross-validation, Probab. Theory Related Fields 92(1) (1992) 1–20. Crossref, ISI, Google Scholar
- 63. , Empirical choice of histograms and kernel density estimators, Scand. J. Stat. 9(2) (1982) 65–78. ISI, Google Scholar
- 64. , International Finance and Open-Economy Macroeconomics Theory, History, and Policy (World Scientific, Singapore, 2010). Link, Google Scholar
- 65. , International Finance and Open-Economy Macroeconomics (Springer, Germany, 2016). Google Scholar
- 66. , Forecasting exchange rates using general regression neural networks, Comput. Oper. Res. 27(11) (2000) 1093–1110. Crossref, ISI, Google Scholar
- 67. , Currency exchange rates prediction based on linear regression analysis using cloud computing, Int. J. Grid Distrib. Comput. 6(2) (2013) 1–10. Google Scholar
- 68. , On evaluating stream learning algorithms, Mach. Learn. 90(3) (2013) 1–30. Crossref, ISI, Google Scholar
- 69. , Quantitative Analysis: An Introduction (CRC Press, Singapore, 1999). Google Scholar
- 70. , Adaptive learning from evolving data streams, in International Symposium on Intelligent Data Analysis (Springer, Lyon, France, 2009), pp. 249–260. Google Scholar
- 71. , On the pointwise and the integral convergence of recursive kernel estimates of probability densities, Util. Math. 15 (1979) 113–128. Google Scholar
- 72. , Probability Theory I, 4th edn. (Springer, USA, 1977). Google Scholar
- 73. , Convergence of random processes in machine learning theory I, Autom. Remote Control 30(1) (1969) 57–76. Google Scholar
| Remember to check out the Most Cited Articles! |
|---|
|
Check out our titles in neural networks today! |


