透過您的圖書館登入
IP:18.119.253.93
  • 期刊
  • OpenAccess

The Research of Survival Analysis with Data Mining Technology

運用資料探勘技術應用於存活分析之研究

若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在台灣乳癌已為排名第四位的癌症,對台灣地區的婦女生命已造成相當大的威脅,而乳部腫瘤診斷是早期偵測乳癌最有效的方法之一,且患者存活率資訊是一個重要且具挑戰性之任務。本研究目的在於利用乳癌的資料庫,配合資料探勘之技術,為乳癌病患建構五年存活能力的預測模式並與傳統統計方法作正確率之比較,期望能於乳癌患者的存活時間預測上,提供專業醫療團隊更具積極化的治療決策參考資訊。本研究使用不同分類工具篩選出重要變數的個數,其中以多元適應性雲形迴歸能以最少的變數獲得最佳的分類結果,因此援用多元適應性雲形迴歸與類神經網路之整合模式,不但能縮減維度,亦不失其鑑別能力,並且此整合模式能大幅降低運算時間,在資料的收斂方面亦較容易達成。又透過無母數檢定之Friedman's rank test與Wilcoxon Signed Rank Test進行配對檢定,發現整合多元適應性雲形迴歸與類神經網路模式之整體鑑別率與所有其他模式之整體鑑別率均達顯著差異。分析結果顯示,本研究所提之整合模式可更精準的預測乳部腫瘤病患的存活能力,對乳癌病患作更合適及更及時的治療,更強化實務之意涵;不但能提供醫學研究之重要變數參考依據,且在資料蒐集及模式建構上獲得最佳之效益。

並列摘要


Breast cancer ranks fourth among all cancers in Taiwan, and is a great threat to the lives of Taiwanese women. Breast cancer screening is one of the most effective ways to detect breast cancer early, but the survival rate of screened patients is an important piece of information that is challenging to obtain. The purpose of this study is to use a breast cancer database in combination with data mining technology to construct a five-year survival prediction model for breast cancer patients and to compare its accuracy with traditional statistical methods to provide professional medical teams with more proactive reference information on the survival of patients with breast cancer to aid their treatment decisions.Various classification models are used to screen for the important variables in breast cancer survival rates. Among them, the MARS model achieves optimal classification results with the fewest variables. When an integrated MARS and ANN model is used, the number of dimensions is reduced but the discriminative ability is maintained. The integrated model greatly reduces the calculation time, and the condensation of the data is more easily achieved. A nonparametric analysis using the Friedman's rank test and Wilcoxon signed rank paired test shows a significant difference between the overall discrimination rates of the integrated MARS and BPN model and the other models. Analytic results demonstrate that this integrated model also classifies the survival of patients with breast cancer more accurately, and if applied in practice should help to provide patients with more adequate and timely treatment. The integrated model not only provides medical researchers with some important reference criteria, but also achieves optimal efficacy in terms of data collection and model construction.

參考文獻


Adriaans, P.,Zantinge, D.(1999).Introduction to data mining and knowledge discovery.Potomac, MD:Two Crows Corporation.
Anderson, T. W.(1984).An introduction to multivariate statistical analysis.New York, NY:John Wiley and Sons.
Baxt, W. G.(1990).Use of an artificial neural network for data analysis in clinical decision-making: The diagnosis of acute coronary occlusion.Neural Computation.2,480-489.
Berenson, M. L.,Levine, D. M.(1999).Basic business statistics concepts and applications.Prentice Hall.
Berry, M. J. A.,Linoff, G.(1997).Data mining techniques: For marketing, sales, and customer support.New York:John Wiley & Sons.

被引用紀錄


呂宜樺(2014)。消費者飲茶行為對有機茶的認知、態度及購買影響之研究〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2014.11112

延伸閱讀