Using Neural Network to Predict Colon Abnormalities with Health Examination Items

指導教授 : 孫德修


近年來國人越來越重視健康檢查,健康檢查的目的在於早期發現疾病、早期治療,並強調預防勝於治療的觀念。藉由健康檢查的篩檢,一些疾病可能可以及早檢查出來,從而得到適當的治療。而本研究發現近年來癌症死亡人口有逐年上升趨勢,在國人癌症排行榜,大腸癌連3年蟬聯第一,大腸癌對國人的影響已經不容忽視。 本研究主是要以林義祥(2011)運用健檢資料建構大腸異常的預測模型為基礎,探討在使用類神經網路方法建構預測模型時,K=10的交互驗證及不同的類神經網路參數對預測結果的影響、探討不平衡資料對預測效能的影響、探討將高相關性健檢項目群組化與未群組化及大腸鏡檢查結果細分類是否影響大腸異常的預測效能。 針對大腸鏡檢查資料的平衡性問題,即針對大腸鏡檢查結果為佔正常與異常的樣本數比例,本研究發現,訓練樣本的正常/異常比例比測試樣本的正常/異常比例低時,預測模型的準確度較低。在使用高相關性健檢項目群組化與大腸鏡檢查結果分為正常與異常兩類的預測模型中,使用5疊交互驗證準確度會比10疊交互驗證優。當比較高相關性健檢項目群組與化未群組化兩種預測模型績效,高相關性健檢項目未群組化的模型準確度(59.68%)較群組化的(63.98%)低,由此可知群組化因子後的模型準確度會有所提升。


In recent years, people more and more pay attention to the health examination. The purpose of health examination aimed is to detect disease in the early stage. Screening by health examination , some diseases may be found early and receive proper treatment. From literature, it can be found that the mortality of cancer has increased in recent years. The mortality of colorectal cancer has in the first place among all other cancer diseases and cannot be ignored. In this study, the health examination data from Yi-Siang Lin (2011) is applied to construct prediction model for colon abnormalities by using the neural network method. Cross validation of different K-fold are applied in the neural network models; different parameter settings are used to assess the performance of the predicted models; different imbalance sample data are used to assess the performance of the performance of the models; the impact of clustering the highly correlated health examination items to the model prediction are also evaluated. In the study of imbalanced sample data, when the ratio of the observed data of normality/abnormality from the colonoscopy in the training sample is higher than that in the testing sample, it resulted in a lower accuracy of prediction. When the highly correlated health examination items are clustered and colonoscopy results are divided into two categories (normality/abnormailty), the accuracy of 5-fold cross-validation is better than 10-fold cross-validation. Finally, when the highly correlated health examination items are clustered, the accuracy of the model (63.98%) is better that of the model with no clustering (59.68%).


6. 李語嫣。(2009),運用資料探勘技術由健康檢查與生活習慣資料建立疾病預測模型-以糖尿病為例,國立成功大學,醫學資訊研究所碩士論文,台南市。
5. 李逢嘉,(2010),特徵選取為基礎之複合分類預測模式-以信用資料為例,國立清華大學工業工程與工程管理學系,新竹市。
31. 趙堡蕓,(2010),代謝症候群盛行率與影響因子初探-以台中市某區域醫院健檢者為例,中臺科技大學,醫護管理所論文,台中市。
19. 陳志達,(2012),運用多變量分析探討大腸異常之重要相關健檢項目,朝陽科技大學工業工程與管理研究所論文,台中市。
11. 林裕森,(2011),運用不同階段檢驗項目建構急性腎衰竭病患之預後模型,朝陽科技大學,工業工程與管理研究所論文,台中市。
