資料探勘在非胰島素依賴型糖尿病預測模型上之研究

由於高熱量食品攝入增多及體力活動減少，使糖尿病躍居全世界發病率及病死率最高的五種疾病之一。在數量急劇增加的糖尿病患者中，主要以非胰島素依賴型（第二型）糖尿病（Non-Insulin-Dependent Diabetes Mellitus，簡稱NIDDM）居多。因此，本研究運用資料探勘（Data Mining）中的分類（Classification）技術與美國皮馬族印第安人糖尿病資料庫（Pima Indians Diabetes Database），建立一個第二型糖尿病的分類預測模型。本研究首先處理有錯誤值（Wrong Value）的屬性，並提出一個屬性選擇（Attribute Selection）的方法，先將對分類重要程度不高的屬性予以刪除，再進行資料探勘。此外，本研究也提出一個評估分類模型優劣的新方法，運用計算面積的方式來評估模型的優劣。實驗的結果顯示，經由屬性篩選過後的分類模型，正確率都有顯著提升。同時，面積評估的方法能避免因參數調整而造成效能評估上的影響，更可客觀地區別模型的好壞。

關鍵字

屬性選擇；分類；資料探勘；糖尿病

並列摘要

As a result of high thermal food takes in increases and the physical strength activity reduces, causes the diabetes to leap to one of the world disease incidence rate and case fatality rate highest five kind of diseases. In the quantity sharp growth diabetes patient, mainly by the non-insulin-dependence diabetes Mellitus (NIDDM) to be in the majority. Therefore, this paper uses the classification technologies of data mining to construct the NIDDM forecast models using Pima Indians Diabetes Database. Besides, this paper proposes a method to measure the important degree of each attribute and delete unimportant attributes before constructing the models. This paper also proposes a new method to evaluate the classification performance by calculating the region area. The experimental results show that the classification performance can be improved significantly after attribute selection. At the same time, the new evaluation method can avoid the confusion on performance for adjusting parameters and objectively evaluate the classification systems.

並列關鍵字

Attribute Selection ； Classification ； Data Mining ； Non-Insulin-Dependent Diabetes Mellitus

參考文獻

李御璽、顏秀珍、林基玄、曾乙甯、馬莉芋()。

Google Scholar

Lee, Y. S.,Yen, S. J.(2002).Neural-Based Approaches for Improving the Accuracy of Decision Trees.Proc. of International Conference on Data Warehousing and Knowledge Discovery.(Proc. of International Conference on Data Warehousing and Knowledge Discovery).

Google Scholar

Lee, Y. S.,Yen, S. J.(2004).Classification Based on Attribute Dependency.Proc. of International Conference on Data Warehousing and Knowledge Discovery.(Proc. of International Conference on Data Warehousing and Knowledge Discovery).

Google Scholar

Lee, Y. S.(2005).Performance Evaluation on a Classification System.Proc. of Conference on Artificial Intelligence and Applications.(Proc. of Conference on Artificial Intelligence and Applications).

Google Scholar

Pyle, D.(1999).Data Preparation for Data Mining.Morgan Kaufmann Publishers.

Google Scholar

被引用紀錄

楊欣明（2009）。資料探勘在健康檢查後續追蹤之應用〔碩士論文，國立屏東科技大學〕。華藝線上圖書館。https://doi.org/10.6346/NPUST.2009.00237

Chen, S. K. (2015). 從國民健康訪問調查資料探勘糖尿病與併發症之風險特性 [master's thesis, Feng Chia University]. Airiti Library. https://doi.org/10.6341/fcu.M0227339

林裕森（2011）。運用不同階段檢驗項目建構急性腎衰竭病患之預後模型〔碩士論文，朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-1511201110382713

陳于佳（2012）。中文文本可讀性特徵選取與模型建立 - 以華語為第二語言教材為例〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315292272

Lin, H. T. (2014). 資料探勘於胸痛分類模型之建構 [master's thesis, National Chung Cheng University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201614004042

國際替代計量

資料探勘在非胰島素依賴型糖尿病預測模型上之研究

全文下載

主題瀏覽