在平行機器架構上擷取高頻項目組

在探勘關聯規則的過程中，首先必須找出滿足最小支持度的項目組，稱之為高頻項目組，是探勘效率上的瓶頸所在，因此，如何提昇擷取高頻項目組的執行效率，即成為探勘關聯規則最重要的研究主題之一。在本篇論文中，我們在兩個平行機器架構上，分別提出兩個演算法來擷取高頻項目組：一是在一個高度為n-1之完整二元樹的網路架構下，執行次數只須m+n-1次的平行演算法，n為全部項目的個數，m為全部交易資料的數目；另一是在一個n-維超立方體的網路架構下，執行次數只須m+n次的平行演算法。而且，兩個演算法都只須讀取每筆交易資料兩次，即可擷取出全部的高頻項目組。

關鍵字

資料探勘；關聯規則；高頻項目組；完整二元樹；超立方體

並列摘要

The process of generating frequent itemsets is a bottleneck in data-mining association rules. Therefore, how to improve performance when generating frequent itemsets is one of the most important problems in formulating such rules. In this report, two parallel algorithms are presented for generating frequent itemsets in two parallel machines, respectively. The first task is to develop a parallel algorithm in a full binary tree of height n-1 for generating frequent itemsets, n being the number of all items and m the number of all transactions. The number of execution steps is represented by m+n-1. The second task is to develop a parallel algorithm in an n-dimensional hypercube for generating frequent itemsets. The number of execution steps is m+n. The algorithms generate frequent itemsets by scanning each transaction only twice.

並列關鍵字

data mining ； association rules ； frequent itemsets ； full binary trees ； hypercubes

國際替代計量

在平行機器架構上擷取高頻項目組

全文下載

主題瀏覽