以約略集合理論為基礎的減量式替代規則擷取演算法

近年來資料探勘已發展成為在知識管理領域中的一項重要研究，主要用以挖掘隱含在龐大的資料庫中不易觀察且有意義的知識。而動態資料庫是目前常見的一種資料庫型態，在資料庫的管理上，經常需要使用到資料刪除的操作。然而，在現有的資料探勘演算法中，探勘的目標資料庫大多假設為靜態資料庫，因此在每次更新完資料庫後，若要獲得更新後的資料庫規則，則需要重新完整的掃描新資料庫後，才可進行資料探勘和規則擷取，而減量式演算法是被用來解決此問題的技術。在資料探勘中，約略集合理論 (RST, Rough Set Theory) 被視為適合用來協助處理質性資料的方法，能從資料中發現潛在的重要事實，然而過去的文獻中指出傳統的約略集合理論卻無法產生包含優先順序的規則，且無法確保決策表的分類是可信的，易造成規則不集中且可信度較低的結果。因此，學者Bill Tseng於2008年提出了替代性規則歸納演算法 (AREA, Alternative Rule Extraction Algorithm)，以偏好權重及強度指數SI (Strength Index) 為基礎來解決上述所提到的相關問題，並同時可處理當產生之規則有同等價值時，以替代性做法保留規則。因此，本研究以AREA為基礎發展了減量式替代性規則歸納演算法 (DAREA, Decremental Alternative Rule Extract Algorithm) 來處理當資料庫在進行刪除資料後，規則重新擷取的情況。不需重新對更新後的資料庫做完整的掃描，便可有效率的進行規則歸納。最後，本研究以自行開發的應用程式來驗證其減量式演算法效率將優於傳統演算法。

關鍵字

資料探勘；動態資料庫；約略集合理論；減量式演算法；規則歸納法

並列摘要

Data mining that explores useful information and helpful knowledge from large databases has evolved into an important research area in recent years. A dynamic database is a common type in many business databases, and the data deletion operations that are also frequently used in database management activities. Unfortunately, most existing data-mining algorithms assume that the database is static, and updating a database requires to re-compute all the patterns by scanning the updated database after deleting data for data mining and rule extraction. The decremental technique is a way to solve the issue of removed-out object without re-implementing the DM algorithm in a dynamic database. One of the promising approaches of DM to knowledge discovery, pattern recognition, decision analysis, and etc. is the Rough Set Theory (RST). The RST is a knowledge discovery tool that can be used to help induce logical patterns hidden in massive data. However, previous RS approaches cannot produce rules containing preference order, namely, cannot achieve to generate more meaningful and general rules. Also, induction based on RS often generates too many rules without focuses and cannot guarantee that the classification of a decision table is credible. Tseng (2008) proposed the AREA (Alternative Rule-Extraction Algorithm) to solve above mentioned problems with discovering preference-based rules according to the reducts with the maximum of strength index (SI), specifially the case that the desired reducts are not necessarily unique since several reducts could include the same value of SI. Thus, in this study based on AREA, DAREA (Decremental Alternative Rule Extract Algorithm) is proposed to solve issue of removed-out objects from database. The algorithm is unnecessary to re-compute rule sets from the very beginning that can quickly generate and complete rules. The experiments are made to validate the proposed approach to be superior to the traditional RS approach.

並列關鍵字

Data mining ； Dynamic Database ； Rough Set Theory ； Decremental Algorithm ； Rule Induction

參考文獻

Ahn, B.S., Cho, S.S. and Kim, C.Y., 2000, "The integrated methodology of rough set theory and artificial neural network for business failure prediction," Expert Systems with Applications, Vol. 18, No. 2, pp. 65–74.

Google Scholar

Asharaf, S., Murty, M. Narasimha and Shevade, S.K., 2006, “Rough set based incremental clustering of interval data,” Pattern Recognition Letters, Vol. 27, No. 6, pp. 515-519.

Google Scholar

Aumann, Yonatan, Feldman, Ronen, Lipshtat, Orly and Manilla, Heikki, 1999, “Borders: An Efficient Algorithm for Association Generation in Dynamic Databases,” Journal of Intelligent Information Systems, Vol. 12, No. 1, pp.61-73.

Google Scholar

Blaszczynski, Jerzy and Słowiński, Roman, 2003, “Incremental Induction of Decision Rules from Dominance-based Rough Approximations,” Electronic Notes in Theoretical Computer Science, Vol. 82, No. 4, pp.40-51.

Google Scholar

Breault, Joseph L., 2001, “Data mining diabetic databases: are rough sets a useful addition?,” Proceedings of the Computing Science and Statistics, Vol. 33.

Google Scholar

國際替代計量

以約略集合理論為基礎的減量式替代規則擷取演算法

主題瀏覽