糖尿病為現今醫學上難以治癒之慢性疾病,其併發症引發死亡人口數逐年提升。近年來糖尿病與癌症的關聯性,於學術界中廣為探討。由於乳癌為台灣地區女性癌症發生率第一名,因此本研究的目的為利用資料探勘技術建構疾病危險因子分析模式以進行糖尿病與乳癌之關聯性分析,期望找出女性糖尿病患者與乳癌較具有相關性的糖尿病併發症。本研究使用回溯性世代研究,研究對象為全民健保資料庫2005年至2012年間的女性糖尿病患者,分析其於未來兩年內罹患乳癌之疾病危險因子。在分析模式中,先採用集群減少多數抽樣技術(under sampling based on clustering, SBC)處理健保資料庫存在的類別不平衡之問題,接著以糖尿病併發症為預測變數,最後使用分類迴歸樹(classification and regression trees, CART)建構女性糖尿病患者罹患乳癌的預測模式,進而找出重要的疾病危險因子。研究結果發現,當女性糖尿病患者,患有「糖尿病所致多發神經病變」或「併有末梢血管循環疾患之糖尿病」時,其罹患乳癌的勝算比顯著較高,代表此兩個糖尿病併發症是與罹患乳癌較有關聯性的重要疾病危險因子。本研究之分析模式能夠發揮資料探勘技術之特性,找出與女性糖尿病患者罹患乳癌相關的重要疾病危險因子,提供有用資訊於醫療方面上作為參考。
Diabetes is a chronic disease which cannot be cured by medical technology nowadays. The deaths caused by diabetes complications are increased year by year. The breast cancer brings huge medical expenses and becomes the burden of the National Health Insurance. Analyzing the relevance between diabetes and breast cancer is an attractive issue in recent years. Among all the cancer, the incidence of breast cancer is the highest in Taiwanese female. Therefore, the purpose of this study is to apply data mining techniques to propose a disease risk factor analysis scheme for analyzing relationship between diabetes and breast cancer. The proposed scheme includes under sampling based on clustering (SBC) which is used to deal with class imbalance problem, and classification and regression trees (CART) which is utilized to build classification model and select important risk factors. The used data of the diabetic patients without breast cancer but suffering breast cancer in next two years are collected from the National Health Insurance Research Database of Taiwan. Experimental results showed that "diabetes neuropathy" and "Diabetes mellitus with peripheral circulatory disorder" are identified as important risk factors by using the proposed scheme. The female diabetic patients with the two risk factors have higher incidence of suffering breast cancer than those without the two factors. The results of this paper provide an effective and appropriate disease prediction model to find important disease risk factors for recognizing the female diabetic patients who would suffer from breast cancer.