近幾年來,利用資料倉儲所儲存的資料方體進行關聯規則採掘的概念逐漸受到重視,陸續有學者提出結合資料方體的採掘方法,並證實此種做法可大幅降低採掘的時間。然而這些研究都假設資料倉儲系統可儲存所有可能的資料方體,未探討當儲存空間有限時,如何選取適當的資料方體加以實體化,以縮短關聯規則採掘的時間。另一方面,過去有關資料方體實體化的選取問題的研究都是針對一般的SQL查詢或OLAP分析,未見有針對資料採掘查詢的研究。本研究的主要目的即在探討在有限的儲存空間下,如何根據使用者所下達的多維度關聯規則查詢,挑選適當的資料方體加以儲存,以減少回答查詢所需的時間。針對此問題,我們明確定義利用資料方體來進行線上關聯規則採掘的模式及其查詢成本的估算方式,並實作及比較幾種啟發式挑選方法,在有限的儲存空間下挑選出最佳的資料方體的組合。
Recently, the concept of utilizing data cubes stored in a data warehouse to facilitate association rule mining has attracted lots of attention. Researchers have proposed data cube based mining methods and proven that such cube-based approaches can significantly reduce the mining time. However, these studies all assume that the data warehouse can store all possible data cubes, disregarding the issue of how to select an appropriate subset of materialized data cubes with respect to a limited storage in order to minimize the total execution time of association queries. On the other hand, most researches for data cube selection problem focused mainly on SQL or OLAP queries; there is no work addressing the data cube se- lection issue for association queries. The main purpose of this study is to investigate under a limited storage and a given set of users' association queries how we can select appropriate set of data cubes to materialize to reduce the query execution time. To this end, we define a cost model for data cube selection problem for online association mining and elaborate the cost estimation for association query. We implement and compare various heuristic algorithms to select suitable data cubes subject to the space constraint.