一個快速的改良密度式分群演算法

隨著資訊科技的進步發展，資料庫的儲存資訊與日俱增，研究如何從大型數據資料庫內快速擷取出訊息的資料探勘技術，便成為當今資料分群研究的顯學。本研究提出一個新的密度式分群演算法，名為QIDBSCAN (Quick IDBSCAN)，基於密度式分群演算法架構，改良其擴張程序，以求時間成本之降低。本方法除了與基於先前學者所提出之密度式演算法架構，且在不增加架構的複雜性之下，兼具非線性合併之密度式演算法的優點，使得本論文提出之演算法能確實縮減時間成本。經由實驗結果的驗證，本論文所提出的QIDBSCAN演算法不但可正確的執行資料分群，且確實的減少分群所需花費的時間成本，在分群正確率與雜訊濾除率均可達到先前學者所提出之演算法的水準。經由實驗結果得知，本研究所提出的QIDBSCAN資料分群演算法不但有效率，且可行性極高。

關鍵字

資料探勘；資料分群；密度式分群

並列摘要

Of the many data clustering algorithms proposed in recent years, the most effective are the density-based clustering algorithms, DBSCAN and IDBSCAN. Although density-based clustering method is effective for identifying graphs, filtering out noise, and obtaining good clustering results, it is extremely time consuming. The IDBSCAN is faster than DBSCAN but is still unsatisfactory. This thesis therefore developed QIDBSCAN (Quick IDBSCAN), a new data clustering algorithm based on IDBSCAN that uses MBOs (Marked Boundary Objects) to expand computing directly without an actual data points selection. The experimental results in this study confirm that QIDBSCAN is substantially faster than IDBSCAN, DBSCAN, and other density-based algorithms.

並列關鍵字

data clustering ； data mining ； density-based clustering

參考文獻

[4] 蔡正發、李俊璋，「DK-means:一個新的使用於資料庫進行資料探勘之高穩定性分群技術」，電子商務研究，5卷4期，頁419-437，2007。

[5] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94–105. ACM Press, Seattle, Washington (1998).

[6] Borah, B., Bhattacharyya, D.K.: An Improved Sampling-Based DBSCAN for Large Spatial Databases. In: Proceedings of International Conference on Intelligent Sensing and Information Processing, pp. 92-96. Chennai, India (2004).

[9] Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Using Dynamic Modeling. IEEE Computer: Special Issue on Data Analysis and Mining. Vol. 32, no. 8, pp. 68-75 (1999).

[10] Liu Bing.: A Fast Density-Based Clustering Algorithm for Large Databases. In: Proceedings of International Conference on Machine Learning and Cybernetics, pp. 996-1000 (2006).

被引用紀錄

呂惠敏（2015）。從「人間條件五」論中年男子的自我價值觀〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2015.00939

韓采燕（2009）。性別化的實驗室：陽剛氣質與科技實作〔碩士論文，國立清華大學〕。華藝線上圖書館。https://doi.org/10.6843/NTHU.2009.00040

賈馥華（2017）。媒介性交易罪之立法正當性〔碩士論文，中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201700172

王友哲（2013）。以創新之高效率與高效能叢集分析技術應用於動態影像分析〔碩士論文，國立屏東科技大學〕。華藝線上圖書館。https://doi.org/10.6346/NPUST.2013.00253

柯筑傑（2013）。猥褻、情趣、快感：台灣社會情趣按摩棒性／別腳本的轉變（1980∼2012）〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2013.01558

國際替代計量

一個快速的改良密度式分群演算法

全文下載

主題瀏覽