透過您的圖書館登入
IP:18.191.228.88
  • 學位論文

應用於分散式系統之平行循序樣本探勘

Parallel Sequential Pattern Mining on Distributed System

指導教授 : 張昭憲

摘要


本研究以提昇循序樣本探勘效率為目標,發展了一套結合多部電腦的分散式探勘系統。首先,由於直接以支持度預測子任務工作量並不準確,我們提出新的工作量預測方法。而後,本研究提出了一套負載平衡演算法-ADLB,結合動態與靜態分配的特點,對於任務採取分段分配,以達成負載平衡並降低通訊負擔。此外,我們也引用並改進文獻[20]中有關演算法切換的概念,以取得更大幅度效能改善。為驗證系統效能,我們結合了16部電腦進行分散式探勘,實驗結果顯示,本系統在各種不同節點數目下均具有良好加速比,顯示其具有處理大型資料之潛力。

並列摘要


The research develop a distributed mining system coordinating multi computers to promote efficiency of sequential pattern mining. First, because of the inaccuracy of directly supporting sequential pattern mining. Then, the work propose a Advanced Dynamic Load Balance algorithm -ADLB. Different to previous works, ADLB divides subtask dispatches into several stages. According to different situations, the static and dynamic load balance method are applied adeptly to prevent the task partition from skew and reduce the communication overhead simultaneously. Furthermore, we also improve the performance with on the basis of citing literature 20. Choose a proper mining algorithm for each database but not apply a single algorithm for all databases with different features. In addition, we combine the sixteen computers that adoptee distributed mining. In comparison with the previous works, the experimental results shows ADLB can effectively reduce the runtime and obtain a better speed-up ratio. This result demonstrates the potentials of ADLB for mining sequential pattern in Very Large Databases.

參考文獻


[19] 張昭憲、周定賢,“以動態任務分配為基礎之分散式循序樣本探勘系統”,第十六屆國際資訊管理學術研討會,台北(輔仁大學),2005年5月。
[20] 張昭憲、黃揚智,“有效率的分散式循序樣本探勘系統”,2006年1月。
[5] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 215-226, 2001.
[9] M. Garofalakis, R. Rastogi, and K. Shim, “Mining Sequential Patterns with Regular Expression Constrains,” IEEE Trans. on Knowledge and Data Eng., Vol. 14, No. 3, pp. 530-552, May/June 2002.
[10] M. J. Zaki, “Sequence mining in categorical domains: Incorporating constraints,” In CIKM, pages 422-429,2000.

被引用紀錄


張 耕(2008)。考量時間機率之循序樣式探勘方法〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2008.00420

延伸閱讀