Title

使用權重動態視窗之密度導向的局部離群值偵測演算法

Translated Titles

A New Density-Based Local Outlier Detection Algorithm Using Weighted Dynamic Window

Authors

廖又葳

Key Words

局部離群值偵測 ; 非監督式離群值偵測 ; 異常值偵測 ; 密度導向之離群值偵測 ; 動態視窗 ; 權重賦予 ; 權重動態視窗 ; Local outlier ; Unsupervised Outlier Detection ; Anomaly Detection ; Density-Based Outlier Factor ; Dynamic window ; Weighting assignment ; Weighted dynamic window

PublicationName

中興大學資訊科學與工程學系所學位論文

Volume or Term/Year and Month of Publication

2017年

Academic Degree Category

碩士

Advisor

吳俊霖

Content Language

英文

Chinese Abstract

所謂的離群值即是一個資料樣本與該資料集中其餘的部分都不一致。而離群偵測演算法在資料分析與圖訊識別領域中是一項非常重要的研究議題,其廣泛地應用於工業、多媒體,商業和工程等不同的領域。本研究主要著重於局部離群值的偵測,亦即一個資料樣本與其周圍資料(非全域資料集)是非常不相似的。現有基於密度的離群偵測演算法-局部離群因子存在有以下的問題:(一)在資料集密度分布較不均勻(其分布有緊密的群也有稀疏的群)或有些微重疊的時候,不能有效地找出離群值;(二)在找尋最近的k個鄰居時,對於參數k的選擇是很敏感的,其選擇很容易會影響離群偵測的精確度。 因此本篇研究提出一個使用權重動態視窗之密度導向的局部離群值偵測演算法,在非監督式偵測的情況下,透過使用動態視窗的擴張以及權重的賦予,來解決以上所提到的問題,主要的目的是想要給予較有相關性的資料較大的影響力。在實驗中我們使用了人造資料以及真實世界的資料來測試我們所提出的演算法,結果也驗證了我們所提出的方法較強健且也能夠更有效地偵測出離群值。

English Abstract

An outlier is an observation sample that is distant from other observations. The outlier detection method area one of the important research topics in data analysis and pattern recognition, it has been widely used in various knowledge domains. The focus of this study is on the local outlier detection, i.e., a sample is dissimilar to its surrounding data (not global dataset). The existing density-based outlier detection algorithm - local outlier factor (LOF) has the following problems: (1) It can’t perform well when the dataset is imbalanced or their density distributions are overlapped; (2) It is sensitive to the selection of the parameter k in finding nearest neighbors. This study aims at implement a better performance density-based outlier detection method which can solve above problems. By using the dynamic window and the weighting assignment, the proposed method can detect the outlier effectively and robustly. Experiments on synthetic and real world datasets demonstrate that our proposed method yields robust and excellent performance.

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊科學與工程學系所
Reference
  1. [2]. V. Chandola, A. Banerjee and V. Kumar, "Anomaly detection", ACM Computing Surveys, vol. 41, no. 3, pp. 1-58, 2009.
    連結:
  2. [5]. S. Ramaswamy, R. Rastogi and K. Shim, "Efficient algorithms for mining outliers from large data sets", ACM SIGMOD Record, vol. 29, no. 2, pp. 427-438, 2000.
    連結:
  3. [8]. H. Fan, O. Zaïane, A. Foss and J. Wu, "Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data", Knowledge and Information Systems, vol. 19, no. 1, pp. 31-51, 2008.
    連結:
  4. [14]. E. Schubert, A. Zimek and H. Kriegel, "Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection", Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 190-237, 2012.
    連結:
  5. [1]. V. Barnett and T. Lewis, Outliers in statistical data, 3rd ed. Chichester [u.a.]: Wiley, 1994.
  6. [3]. E. Knorr and R. Ng, "Algorithms for mining distancebased outliers in large datasets", in Proceedings of the International Conference on Very Large Data Bases, pp. 392-403, 1998.
  7. [4]. E. Knorr, R. Ng and V. Tucakov, "Distance-based outliers: algorithms and applications", The VLDB Journal The International Journal on Very Large Data Bases, vol. 8, no. 3-4, pp. 237-253, 2000.
  8. [6]. M. Breunig, H. Kriegel, R. Ng and J. Sander, "LOF", ACM SIGMOD Record, vol. 29, no. 2, pp. 93-104, 2000.
  9. [7]. R. Momtaz, N. Mohssen and M. Gowayyed, "DWOF: A Robust Density-Based Outlier Detection Approach", in Pattern Recognition and Image Analysis, Berlin, Heidelberg, pp. 517-525, 2013.
  10. [9]. E. Schubert, A. Koos, T. Emrich, A. Zufle, K. A. Schmid, and ぴ A. Zimek, “A framework for clustering uncertain data,” Proc. of the VLDB Endowment, vol. 8, no. 12, pp. 1976–1979, 2015.
  11. [10]. L. Fu and E. Medico, "FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data", BMC Bioinformatics, vol. 8, no. 1, p. 3, 2007.
  12. [11]. G. Markus, "Unsupervised Anomaly Detection Benchmark - Unsupervised Anomaly Detection Dataverse", Dataverse.harvard.edu, 2015. [Online]. Available: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF.
  13. [12]. S. Rayana, "ODDS Library", Odds.cs.stonybrook.edu, 2016. [Online]. Available: http://odds.cs.stonybrook.edu.
  14. [13]. Wikipedia contributors, “Normal distribution”, Wikipedia.org, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Normal_distribution.
  15. [15]. W. Jin, A. K. H. Tung, and J. Han. “Mining top-n local outliers in large databases.” In KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 293–298, 2001.
  16. [16]. K. Zhang, M. Hutter, and H. Jin. “A new local distance-based outlier detection approach for scattered real-world data.” Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pages 813–822, 2009