隨著生活型態的日益改變,許多慢性疾病已逐漸成為人類常見的死亡原因,但也因為慢性疾病的初期症狀往往不明顯且容易與其他疾病相互影響,造成患病的連鎖效應,進而增加預防與治療的困難性。過去雖然已有許多的研究針對單一疾病進行風險因子篩選與預測模式的建立,卻少有同時考慮多個具有相互影響的多重疾病。此外,由於人體的構造可想像為一個複雜的工廠,如何將可靠度的概念運用於特定多重疾病組合下之人體生理狀態健康評估是值得研究的議題。本研究提出一套分析流程,從篩選多重疾病的共同生理指標開始,到建立能同時預測多重疾病生理狀態的預測模式。另一方面,藉由分析各項共同生理指標的檢測值在發生變化時,對於人體產生特定多重疾病的影響性,評估各項共同生理指標與多重疾病組合下的疾病風險度。 整個分析流程主要分為三個部份。首先,在篩選多重疾病共同生理指標的階段,針對所有的受測者,以多種不同的資料探勘技術各別進行生理指標篩選後,再透過多數投票的方式找出多重疾病的共同生理指標。第二個部份是利用所找出的多重疾病共同生理指標,以多屬性反應值預測法建構出多重疾病的預測模式。最後的部份則是透過核密度估計技術將各項共同生理指標在不同生理狀態下的分佈情況進行資料配適,再計算出各項生理指標在不同數值的情況下,受檢者患病的機率。本研究所定義的多重疾病風險度即為受檢者的各項共同生理指標檢驗值不屬於健康族群機率之乘積。 本研究利用三個不同資料庫做為說明分析流程使用的案例,每個案例都是以包含邏輯斯迴歸法、決策樹與鑑別分析在內的六種分類技術組合成多重分類器進行多重疾病的共同生理指標篩選後,再以多元適應性雲形迴歸法及類神經網路等方法建構多重疾病的預測模式。在加州大學心臟疾病資料庫中,地中海型貧血(Thalassemia ,thal)、彩色照影下的主動脈數量(Number of major vessels colored by flourosopy, ca)、胸痛類型(Chest pain type, cp)及運動時是否誘發心絞痛(Exercise induced angina, exang)四個特徵變數為各項心臟疾病的共同生理指標,利用這四個生理指標所建構的多層感知機神經網路可達到67.16%的預測正確率。在衛生署所提供的高血壓、高血脂及高血糖盛行率調查資料庫中,利用多重分類器所篩選出的共同生理指標為空腹血糖(Fasting plasma glucose, FPG)、總膽固醇(Total Cholesterol, T-CHO)、三酸甘油脂(Triglyceride, TG)、收縮壓(Systolic Blood Pressure, SBP)及舒張壓(Diastolic Blood Pressure, DBP),且透過多層感知機神經網路的預測模式,可達到98.91%的預測正確率。經由模擬Stewart K. J.等人在2005年的研究結果進行分析,發現若持續進行為期六週、每週3次的運動維護後,可使人體藉由運動維護之進行而平均減少大約7.29%罹患三高疾病的可能性。從國內某教學醫院取得的健康檢查資料庫進行分析可發現,性別(gender)、總膽固醇(T-CHO)、收縮壓(SBP)與舒張壓(DBP)為高血壓與高血脂疾病的共同生理指標。藉由多元適應性雲形迴歸法或類神經網路建構的兩疾病預測模式,皆可達到超過92%的總預測正確率。而模擬Lewis等人在1976年的研究結果則發現,若持續17週的運動介入,每週慢跑散步2.5英哩及一小時的柔軟體操,可減少平均4.63%罹患高血壓或高血脂疾病的可能性。 透過執行本研究所提出的多重疾病分析流程,可找出具有相互影響性的多重疾病之共同生理指標,並且建立能判別各種生理狀態的多重疾病預測模式。另一方面,在篩選出多重疾病的共同生理指標之後,能對人體生理狀態的健康情況進行評估,並且量化當進行維護介入活動後,人體健康狀況的改善效果。
Certain chronic diseases become the major causes of death with changes in lifestyle; however, the initial symptoms of these chronic diseases are usually not obvious and mutually induced with other diseases. Therefore, their prevention and treatment are difficult. Many previous studies have employed predictive models for a specific disease. However, these studies fail to note that some associated multiple diseases might have reciprocal effects, and abnormalities in physiological indicators can indicate multiple associated diseases. In addition, risk of failure is commonly used, and the assessment of physiological systems in human body by using this concept is an interesting issue. In this study, we developed an analysis process by selecting common physiological indicators of multiple diseases and constructing a predictive model for multiple physiological conditions. Moreover, the values of the common physiological indicators were varied with different physiological states in order to construct a risk model of diseases for physiological systems. Various data mining technologies were used in multiple classifier systems to extract common physiological indicators of multiple diseases by the major voting method in the first part of the analysis process. The second part focused on constructing predictive models for multiple diseases by using common physiological indicators that serve as predictors. Kernel density estimation applied to fit the distribution of each common physiological indicator for probability estimation belongs to the health condition. In addition, reliability of the physiological system, defined as the product of all probabilities, belongs to the health condition in all common physiological indicators. In this study, three cases were used to explain the analysis process. In each case, six data mining technologies including logistic regression, decision trees, and discriminant analysis were first combined to select the common physiological indicators of multiple diseases and then applied to multivariate adaptive regression splines (MARS) and artificial neural networks (ANN) to construct a predictive model for multiple diseases. In the UCI heart diseases dataset, thalassemia (thal), number of major vessels colored by fluoroscopy (ca), chest pain types (cp), and exercise-induced angina (exang) were the common physiological indicators of heart diseases. The highest predictive accuracy rate of multilayer perceptron neural network (MLPNN) achieved by these indicators was found to be 67.16%. The second dataset includes the survey on prevalence of high blood sugar level, hyperlipidemia, and hypertension in Taiwan; it received a grant from Bureau of Health Promotion, Department of Health, in Taiwan. The common physiological indicators of these three diseases were fasting plasma glucose (FPG), total cholesterol (T-CHO), triglyceride (TG), systolic blood pressure (SBP), and diastolic blood pressure (DBP). These common physiological indicators, which are not only consistent with the clinical guidelines but also used in MLPNN for constructing the predictive model, can achieve the accuracy rate of 98.91%. This study simulated the result of Stewart et al. (2005) and estimated that the probability of suffering from hypertension, hyperlipidemia, or high blood sugar would reduce to 7.29% after completing a 6-week exercising period of 3 days a week. The third dataset used in this research was received from a health inspection center in a hospital. According to the analysis process, the common physiological indicators of hypertension and hyperlipidemia were sex, total cholesterol, SBP, and DBP. By using either MARS or ANN technology to construct the predictive models, all models can achieve over 92% accuracy rate. Moreover, the reliability and maintainability of physiological systems can be quantified by using the kernel density estimation approach for analyzing the distribution of any physiological condition of the aforementioned four common physiological indicators. This study simulated the finding of Lewis et al. (1976) and estimated that the probability of suffering from hypertension or hyperlipidemia would reduce to 4.63% after a 17-week exercising period. The analysis process proposed in this study enables the selection of the common physiological indicators of multiple diseases and construction of a predictive model for these diseases. Moreover, the reliability of physiological systems and the effect of maintenance activities on human health can be quantified.