由於高熱量食品攝入增多及體力活動減少,使糖尿病躍居全世界發病率及病死率最高的五種疾病之一。在數量急劇增加的糖尿病患者中,主要以非胰島素依賴型(第二型)糖尿病(Non-Insulin-Dependent Diabetes Mellitus,簡稱NIDDM)居多。因此,本研究運用資料探勘(Data Mining)中的分類(Classification)技術與美國皮馬族印第安人糖尿病資料庫(Pima Indians Diabetes Database),建立一個第二型糖尿病的分類預測模型。本研究首先處理有錯誤值(Wrong Value)的屬性,並提出一個屬性選擇(Attribute Selection)的方法,先將對分類重要程度不高的屬性予以刪除,再進行資料探勘。此外,本研究也提出一個評估分類模型優劣的新方法,運用計算面積的方式來評估模型的優劣。實驗的結果顯示,經由屬性篩選過後的分類模型,正確率都有顯著提升。同時,面積評估的方法能避免因參數調整而造成效能評估上的影響,更可客觀地區別模型的好壞。
As a result of high thermal food takes in increases and the physical strength activity reduces, causes the diabetes to leap to one of the world disease incidence rate and case fatality rate highest five kind of diseases. In the quantity sharp growth diabetes patient, mainly by the non-insulin-dependence diabetes Mellitus (NIDDM) to be in the majority. Therefore, this paper uses the classification technologies of data mining to construct the NIDDM forecast models using Pima Indians Diabetes Database. Besides, this paper proposes a method to measure the important degree of each attribute and delete unimportant attributes before constructing the models. This paper also proposes a new method to evaluate the classification performance by calculating the region area. The experimental results show that the classification performance can be improved significantly after attribute selection. At the same time, the new evaluation method can avoid the confusion on performance for adjusting parameters and objectively evaluate the classification systems.