透過您的圖書館登入
IP:3.148.241.210
  • 期刊

使用模糊Apriori演算法於數值型資料關聯規則探勘之研究-以糖尿病資料為例

Using a Fuzzy Apriori Algorithm to mine Association Rules in Numerical Data- Using Diabetes Data as an Example

摘要


關聯規則演算法有很多,而Apriori演算法是最早被提出也最具代表性的。關聯規則演算法也常被稱為購物籃分析,是以項目有沒有出現作為評斷,1就代表有,0就代表沒有,計算每個項目在購物清單出現的次數,並由高頻項目集合中找出關聯規則。但是在生活上很多資料型態都是數值型的,像是血壓、身高、年齡。本研究中提出結合模糊理論的Apriori演算法,首先將數值型資料先予以模糊化,並對每一個模糊語意建構其歸屬函數,再將每一筆資料的數值轉換成模糊集合及其對應隸屬函數的歸屬值。最後,參考原始Apriori演算法,設計出以歸屬度為基的支持度與信度計算方法,由此可以找出其關聯規則。另外,根據衛福部健康署統計,糖尿病是我國十大死因之一,台灣每年有將近萬人因為糖尿病而死亡,全國大約有兩百多萬名的糖尿病患者,且每年都還持續增加中。又與糖尿病有關的屬性中,許多屬性是連續的數值型資料。因此,本研究將以UCI公開資料庫中印第安皮馬人糖尿病的資料做探討,以所提出的模糊Apriori演算法來探勘,找出造成皮馬人糖尿病的關聯規則,並以此驗證所提出方法的可行性。本研究發現,所提的演算法可以找出數值型資料的關聯規則,且有找到對糖尿病的診斷有幫助的關聯規則。

並列摘要


There are many kinds of Association Rule algorithms and Apriori algorithm is the earliest and most representative one. Apriori algorithm, also called basket analysis, tries to find out the large items from the item set to induce the association rules. Once an item appears in the item set, it is recorded as 1, otherwise it is recorded as 0. The large item whose support is larger than a preset threshold will be induced. However, much data in our life, such as pressure, height and age, are represented as continuous numerical value. In order to induce the association rules from these continuous data, this study proposes a fuzzy Apriori algorithm. First, the numerical data is transformed into fuzzy sets and the membership function for each fuzzy set is created. Then the fuzzy membership value of a fuzzy set for a numerical data will be derived and used to represent its frequency in the item set. Finally, the Apriori algorithm is applied to these fuzzy value and the support and confidence equations were derived to induce the association rules. According to the statistics by Health Promotion Administration, diabetes is one of the ten leading causes of death in Taiwan. Millions of people die from diabetes in Taiwan. There are around two millions of diabetes patients in this country, and diabetes patient number is still increasing every year. It is found that most of the attributes related to diabetes are numerical data. Therefore, this study applied the proposed fuzzy Apriori algorithm to the data of Pima India diabetes in UCI database as an example to mine the association rules for the people who have diabetes. The results of this research show that the proposed fuzzy Apriori algorithm do find some association rules for the numerical diabetes data, and the rules are useful for diagnosis of diabetes.

延伸閱讀