關聯法則(Association Rules)廣泛應用於資料探勘研究方法,於過往研究中,大都針對支持度(Support)較高之高頻項目集(Frequent ItemSets)進行探勘,然而卻無法迅速且有效探勘出支持度小但卻擁有重要關聯性之重要稀少性資料(Significant Rare Data),亦即所謂之半高頻項目集(Semi-frequent ItemSets)。現今有部份研究針對具備重要關連法則之稀少性資料,進行相關探勘方法設計,其方法大都採用由下而上(Bottom-Up)搜尋方式,但往往無法有效率探勘出最大半高頻項目集(Maximal Semi-frequent ItemSets)。針對上述問題,本研究提出與設計專門針對重要稀少性資料之最大半高頻項目集探勘演算法(Maximum Semi-frequent Itemsets Algorithm, MSIA),MSIA可有效整合分群(Cluster)與分解(Decomposition)探勘概念,並結合篩選法(Filter)與相對支持度(Relative Support)分析方法,採由上而下(Top-Down)之搜尋機制進行高效率最大半高頻項目集探勘。由效能實驗結果可知,MSIA於探勘過程中可以有效降低原始來源資料庫(Source Database)讀取掃描次數,提升探勘效能以節省探勘時所花費之時間成本,進而有效且快速取得重要稀少性資料中之最大半高頻項目集。
Mining out the association rules is the popular research issue in data mining research. In recent years, many studies have focused on discovering the important association rules based on the criteria of maximum support and confidence for frequent itemsets. The significant rare data, i.e., the semi-frequently itemsets, are not easily to mine out the important association rules using traditional mining methods. Some mining methods based on the bottom-up policy can not efficiently mine out association rules from longer length of semi-frequent itemsets. The time complexity of mining process is very high due to the generation of large candidates by repeatedly scanning source database. This research proposed the maximum semi-frequent itemsets algorithm (MSIA), which quickly and efficiently mining out the association rules on the significant rare data. MSIA is a top-down approach by combining the techniques of clustering, decomposition, filtering, and relative supports to efficiently search the source database. From the performance of experiment results, the MSIA can decrease the time complexity of scanning database and thus significantly reduce the number of candidate itemsets. MSIA efficiently mines out the useful association rules from the maximum semi-frequent itemsets.