透過您的圖書館登入
IP:18.219.98.26
  • 學位論文

應用於資料探勘上之高效能硬體架構設計

High Performance Hardware Enhanced Frameworks on Data Mining

指導教授 : 陳銘憲

摘要


使用硬體去加速資料探勘演算法是一個新興的議題。在本論文中,我們針對頻繁時間樣式探勘與資料分群演算法分別提出相對應的硬體架構來提高效能,藉由硬體的平行性去加速資料探勘演算法中最耗時的程序,以提昇整個演算法的資料處理速度。針對頻繁時間樣式探勘,我們提出了一個Apriori-like演算法的電路去處理會隨著資料項目增加而呈指數成長的頻繁雙項目組(frequent 2-itemsets)。透過該電路,資料只要經過一次掃描,頻繁單一與雙項目組都能在固定的時間水準內完成。另外針對資料分群演算法硬體改進方案,我們整合硬體的質心(centroid)更新機制進入資料分群演算法的執行流程,大量減少質心更新的時間以提高效能。從各種實驗數據看來,相對於傳統完全採用軟體去執行資料探勘的演算法,使用硬體加速在效能上可以得到可觀的改進。

並列摘要


Hardware enhanced mining is an emerging issue. In this thesis, we propose two frameworks to enhance the speed of mining problems: temporal pattern mining in data streams and K-means clustering algorithm. By exploiting the parallelism in hardware, many data mining primitive subtasks can be executed with high throughput, thus increasing the performance of the overall data mining tasks. Specifically, in temporal pattern mining we realize Apriori-like algorithm within our proposed hardware enhanced mining framework. Even with the quadratic increase of the size of 2-itemsets, the counts of frequent 1-itemsets and 2-itemsets are obtained after one pass of the datasets through our hardware implementation, thus the throughput is maintained at constant level. Moreover, we propose a KACU (standing for K-means with hArdware Centroid updating) framework which integrates a hardware centroid updating mechanism into the procedure of continuous K-means algorithm. The proposed hardware frameworks are implemented in commercial Field Programmable Gate Array (FPGA) devices in order to measure their performance. The experimental results show that the hardware enhancements achieve considerably higher performance than traditional mining algorithm architectures with pure software implementation.

參考文獻


[5] Altera Corporation. http://www.altera.com.
[6] A. Bagchi, A. Chaudhary, D. Eppstein, and M. T. Goodrich. “Deterministic sampling and range counting in geometric data streams.” Proceeding of the 20th ACM Symposium on Computational Geometry, pages 144—151
[8] Ordonez C., “Clustering Binary Data Streams with K-means.” Proceeding of ACM DMKD, 2003.
[9] Joong Hyuk Chang and Won Suk Lee. “Finding recent frequent itemsets adaptively over online data streams.” Proceeding of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
[10] Ming-Syan Chen, Jiawei Han, and Philip S. Yu. “Data mining: an overview from a database perspective.” IEEE Trans. On Knowledge And Data Engineering, 8:866—883, 1996.

延伸閱讀