在完美敏感度條件下的專一性最佳化－用以電腦輔助醫學資料之分析

在這篇論文中我們提出一個對於醫學資料探勘的新的評估標準名為「在完美敏感度下的專一性」。這種方法專注於評估一個分類的模型能夠確認其分類為negative資料的完美性的程度。在醫學資料探勘中對於false negative的資料的代價極高導致不應允許任何的false negative資料存在，針對這樣的情況我們這為這個評估標準是非常有用的。我們更進一步提出兩種策略來提高這個評估標準。第一種策略稱為可疑性擴展法(suspicion expansion)，利用放寬對positive的定義以將靠近positive資料的資料點定義為可疑性的(suspicious)資料來增強我們對negative資料的信心。第二種策略稱為容忍偽陽性法(false positive tolerance)，藉由容忍一些positive病人中誤判的negative資料點來降低病人的false negative。實驗結果顯示相較於傳統的分類器，我們的方法可以使分類器在完美敏感度下的專一性有明顯的進步。

關鍵字

醫學資料探勘；多示例學習；評價指標； ROC曲線分析

並列摘要

In this thesis we purpose a novel evaluation criterion “specificity under perfect sensitivity” for medical data mining. This criterion aims at assessing the effectiveness of a classification model in confirming the perfection of the predicted negative data. We argue that this criterion could be useful for medical data mining when the penalty for false negative is extremely high so that no any false negative should be allowed. We further purpose two strategies to assist a classifier to obtain higher SUPS. The first strategy tries to loosen the criterion of positive by assigning negative instances closer to a positive one as suspicious, in order to enhance the confidence of predicted negative data. The second one tolerates the misclassified negative instances of positive patients to reduce the false negative rate of patients. The experiment results show that our methods can improve SUPS significantly comparing to the original classifiers.

並列關鍵字

medical data mining ； ROC curve analysis ； multi-instance learning ； evaluation metric

參考文獻

[ 1 ] Rangaraj M. Rangayyan, Fábio J. Ayres and J.E. Leo Desautels (2007). “A review of computer-aided diagnosis of breast cancer: Toward the detection of subtle signs.” In Proceeding of the Journal of the Franklin Institute, Volume 344, Issues 3-4, p.312-348.

[ 6 ] Wikipedia: Receiver operating characteristic

[ 9 ] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin(2008). “LIBLINEAR: A Library for Large Linear Classification.” In proceeding of Journal of Machine Learning Research 9, 1871-1874. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear

[ 10] R. E. Schapire and Y. Singer (1999). “Improved Boosting Algorithms Using Confidence-Rated Predictions.” In Proceedings of Machine Learning, Volume 37, Issue 3, p 297-336.

[ 11 ] R. M. Bell, P. G. Haffner and C Volinsky (2006). “Modifying Boosted Trees to Improve Performance on Task 1 of the 2006 KDD challenge Cup.” In proceedings of ACM SIGKDD Exploration Newsletter, Volume 8, Issue 2, p 47-52.

國際替代計量

在完美敏感度條件下的專一性最佳化－用以電腦輔助醫學資料之分析

主題瀏覽