透過您的圖書館登入
IP:18.118.150.80
  • 學位論文

多變量ROC曲線統計方法之研究

Multivariate approach for ROC curve

指導教授 : 吳建華

摘要


本篇論文主要探討如何利用接收者操作特徵曲線,判斷增加變數是否可以提升對疾病預測的準確度。在本篇論文中我們分類的方法是藉由已知的病患資料算出分類的指標,依據此指標將病人分類至有病或沒病的群組。在真實病患資料當中這些變數的資料型態可能為常態或非常態型資料,所以我們將會用常態與非常態型的資料做計算。在做病人的分類時若只利用單變數去分類容易發生分類錯誤的情況,所以我們欲增加新變數提高對於此疾病的預測準確度,並分成單變數與雙變數兩種情況做比較。帶入計算接收者操作特徵曲線的方法中我們利用兩種方法(一般方法與常態方法)各對常態型資料與非常態型資料去做計算,針對是否會因為對於不同的資料型態利用不同計算出接收者操作特徵曲線的方法是否會影響疾病預測的準確度。經過運算後我們使用不同的方法(MAX與MIN)找尋我們的切點(Cut off point),這兩種方法所找尋出的切點(Cut off point)疾病預測的準確度也會有所不同。在計算結果中我們發現常態型資料利用常態方法計算出的接收者操作特徵曲線(ROC Curve),再利用MIN尋找切點(Cut off point)的情況下可以藉由增加新的變數可以提升對疾病預測的準確度。

並列摘要


This paper explores the how to use the ROC curve to add the new variable so that increase the accuracy of the Disease prediction, in this paper we use the known Patients data to calculate the Index of classification. According to this index, we classify the patient to two groups, one is the sick group, another group is the healthy group. This data patterns could be following a normal distribution or not following a normal distribution, in this paper, we would calculate this two situations. If classify the patient when we just use the one variable is easy to make mistakes, so we think if we add the new variable, that would increase the accuracy of the disease prediction. And we compare the one variable and the two variables. In the calculation of the ROC curve, we use the two way to do it, one we call the stander way and the normal way according to two difference data patterns, to observed that if we used the different way to calculate then the results might be different, then after this calculate we used the two way to find the Cut off point, one is called MIN another is called the MAX, we also observed that if we used the different way to calculate then the results. In our calculation results, we find that the data patterns following a normal distribution used the normal way than used the MIN to find the Cut off point is can add a new variable to increase the accuracy of the disease.

參考文獻


3. Fawcett T. (2006). An introduction to ROC analysis, Pattern Recognition Letters. New York, NY, USA.
5.Andrews D.F. and Herzberg A.M.(1985). Data A Collection of Problems from Many Fields for the Student and Research Worker. Springer-Verlag New York Inc. 1985.
6.Johnson R.A. and Wichern D.W.(2002). Applied multivariate Statistical Analysis, 5th Edition. Pearson.
1.陳景祥 (2010)。 R軟體 : 應用統計方法。修訂版。台北市 : 台灣東華。
2.血友病防治語言研究中心 (2004)。何謂血友病。

延伸閱讀