World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.
This article is part of the issue:

High-dimensional data poses unique challenges in outlier detection process. Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on dataset of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. In particular, the proposed method outperforms the benchmark methods as measured by F1-score. Our method also produces better-than-average execution times compared with the benchmark methods.

References

  • Aggarwal, CC and PS Yu [2001] Outlier detection for high dimensional data. In ACM Sigmod Record, Vol. 30(2), pp. 37–46. New York: ACM. Crossref, ISIGoogle Scholar
  • Angiulli, F and C Pizzuti (2002, August). Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery, pp. 15–27. Berlin: Springer. CrossrefGoogle Scholar
  • Barnett, V and T Lewis [1994] Outliers in Statistical Data, Vol. 3. USA: Wiley. Google Scholar
  • Breunig, MM, HP Kriegel, RT Ng and J Sander [2000] LOF: identifying density-based local outliers. In ACM Sigmod Record, Vol. 29(2), pp. 93–104. New York: ACM. Crossref, ISIGoogle Scholar
  • Campello, RJ, D Moulavi, A Zimek and J Sander [2015] Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(1), 5. Crossref, ISIGoogle Scholar
  • Campos, GO, A Zimek, J Sander, RJ Campello, B Micenkov, E Schubert and ME Houle [2016] On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), 891–927. Crossref, ISIGoogle Scholar
  • Domingues, R, M Filippone, P Michiardi and J Zouaoui [2018] A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recognition, 74, 406–421. Crossref, ISIGoogle Scholar
  • Gao, J, W Hu, ZM Zhang, X Zhang and O Wu [2011] RKOF: Robust kernel-based local outlier detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 270–283. Berlin: Springer. CrossrefGoogle Scholar
  • Guo, J, W Huang and BM Williams [2015] Real time traffic flow outlier detection using short-term traffic conditional variance prediction. Transportation Research Part C: Emerging Technologies, 50, 160–172. Crossref, ISIGoogle Scholar
  • Gurrib, I and F Kamalov [2019] The implementation of an adjusted relative strength index model in foreign currency and energy markets of emerging and developed economies. Macroeconomics and Finance in Emerging Market Economies, 12(2), 105–123. Crossref, ISIGoogle Scholar
  • Harris, P, C Brunsdon, M Charlton, S Juggins and A Clarke [2014] Multivariate spatial outlier detection using robust geographically weighted methods. Mathematical Geosciences, 46(1), 1–31. Crossref, ISIGoogle Scholar
  • Hautamaki, V, I Karkkainen and P Franti [2004] Outlier detection using k-nearest neighbour graph. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, Vol. 3, pp. 430–433. New York: IEEE. CrossrefGoogle Scholar
  • Jin, W, AK Tung and J Han [2001] Mining top-n local outliers in large databases. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–298. New York: ACM. CrossrefGoogle Scholar
  • Jin, W, AK Tung, J Han and W Wang (2006, April). Ranking outliers using symmetric neighborhood relationship. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 577–593. Berlin: Springer. CrossrefGoogle Scholar
  • Joudaki, H, A Rashidian, B Minaei-Bidgoli, M Mahmoodi, B Geraili, M Nasiri and M Arab [2015] Using d mining to detect health care fraud and abuse: A review of literature. Global Journal of Health Science, 7(1), 194. Google Scholar
  • Kamalov, F [2020] Kernel density estimation based sampling for imbalanced class distribution. Information Sciences, 512, 1192–1201. Crossref, ISIGoogle Scholar
  • Kamalov, F and F Thabtah [2017] A feature selection method based on ranked vector scores of features for classification. Annals of Data Science, 4(4), 483–502. CrossrefGoogle Scholar
  • Knorr, EM and RT Ng [1997] A unified notion of outliers: Properties and computation. In KDD, Vol. 97, pp. 219–222. Google Scholar
  • Kriegel, HP, P Krger and A Zimek (2010). Outlier detection techniques. Tutorial at KDD. Google Scholar
  • Kriegel, HP and A Zimek [2008] Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452. New York: ACM. CrossrefGoogle Scholar
  • Liu, FT, KM Ting and Z-H Zhou [2008] Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. New York: ACM. CrossrefGoogle Scholar
  • Moustafa, N, G Creech and J Slay [2018] Anomaly detection system using beta mixture models and outlier detection. In Progress in Computing, Analytics and Networking, pp. 125–135. Singapore: Springer. CrossrefGoogle Scholar
  • Nian, K, H Zhang, A Tayal, T Coleman and Y Li [2016] Auto insurance fraud detection using unsupervised spectral ranking for anomaly. The Journal of Finance and Data Science, 2(1), 58–75. CrossrefGoogle Scholar
  • Pang, G, L Cao, L Chen and H Liu [2018] Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2041–2050. New York: ACM. CrossrefGoogle Scholar
  • Ram, P and AG Gray [2018] Fraud detection with density estimation trees. In KDD 2017 Workshop on Anomaly Detection in Finance, pp. 85–94. Google Scholar
  • Ramaswamy, S, R Rastogi and K Shim [2000] Efficient algorithms for mining outliers from large data sets. In ACM Sigmod Record, Vol. 29(2), pp. 427–438. New York: ACM. Crossref, ISIGoogle Scholar
  • Rousseeuw, PJ and K Van Driessen [1999] A fast algorithm for the minimum covariance determinant estimator. Techno-Metrics, 41(3), 212. Crossref, ISIGoogle Scholar
  • Salehi, M, C Leckie, JC Bezdek, T Vaithianathan and X Zhang [2016] Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering, 28(12), 3246–3260. Crossref, ISIGoogle Scholar
  • Scholkopf, B et al. [2001] Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. Crossref, ISIGoogle Scholar
  • Schubert, E, A Zimek and HP Kriegel [2014] Generalized outlier detection with flexible kernel density estimates. In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 542–550. Philadelphia: Society for Industrial and Applied Mathematics. CrossrefGoogle Scholar
  • Silverman, BW [2018] Density Estimation for Statistics and Data Analysis. UK: Routledge. CrossrefGoogle Scholar
  • Solorio-Fernandez, S, JA Carrasco-Ochoa and JF Martnez-Trinidad [2019] A review of unsupervised feature selection methods. Artificial Intelligence Review, 1–42. ISIGoogle Scholar
  • Tang, J, Z Chen, AWC Fu and DW Cheung [2002] Enhancing effectiveness of outlier detections for low density patterns. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 535–548. Berlin: Springer. CrossrefGoogle Scholar
  • Tang, B and H He [2017] A local density-based approach for outlier detection. Neurocomputing, 241, 171–180. Crossref, ISIGoogle Scholar
  • Tukey, JW [1977] Exploratory Data Analysis. Reading: Addison-Wesley. Google Scholar
  • van Capelleveen, G, M Poel, RM Mueller, D Thornton and J van Hillegersberg [2016] Outlier detection in healthcare fraud: A case study in the Medicaid dental domain. International Journal of Accounting Information Systems, 21, 18–31. Crossref, ISIGoogle Scholar