約略集合是由波蘭Z. Pawlak 在1982年提出,用來處理含糊資訊的新數學方法,它能夠從資料中發現潛在的重要事實,但是過去的文獻中指出傳統的約略集合不能確保決策表的分類是可信任的,因此,University of Texas Austin的 Bill Tseng教授提出REA (Rule-Extraction Algorithm)來解決這問題,但是產生的SI(strength index)不一定只有一個,可能會有數個同價值的SI,所以Tseng教授又提出AREA (Alternative Rule Extraction Algorithm)來解決擷取規則不完整這問題,另外,約略集合演算法雖然可以有效的產生分類規則,但是卻無法在新增一個物件後產生新規則。過去大多的研究是無法解決這類大型資料庫增量式問題,因此,本研究以AREA為基礎的增量式規則擷取演算法來解決上面描述的問題,但AREA可能會產生重複的規則,因此有稍作修改。最後,透過本文提的演算法當有新物件或屬性新增時,不需要重頭對原始資料進行規則歸納,就可以很快速的產生完整規則且不重複的規則,此演算法解決了新資料新增的問題因此也減少了多重新計算的時間。
Rough set theory, proposed by Pawlak in 1982 can be seen as a new mathematical approach to vagueness. It is capable of discovering important facts hidden in that data. However, previous literature indicated these rough set approaches cannot guarantee that the classification of a decision table is credible. Therefore, Tseng (2006) proposed the REA (Rule-Extraction Algorithm) to solve the problem. But the desired reducts are not necessarily unique since several reducts could include the same value of SI. Thus, Tseng (2008) proposed AREA (Alternative Rule Extraction Algorithm) to solve the non-complete rules problem. In addition, the current algorithms of rough set have the ability to generate a set of classification rules efficiently, but they cannot generate rules incrementally when new objects are given. And numerous studies of incremental approaches are not capable to deal with the problems of large database. Therefore, in this study, an incremental rule-extraction algorithm is proposed based on the AREA to solve the aforementioned problem. In addition, the AREA maybe generating repetitive rules, the algorithm is developed to exclude these repetitive rules in the solution search procedure. Using this algorithm, when a new object or attribute are added up to information system, it is unnecessary to re-compute rule sets from the very beginning, that can quick generate the complete and not repetitive rule. Resolve the incremental issues of new data add-in, hence a lot of time are saved.