透過您的圖書館登入
IP:34.203.221.104
  • 學位論文

多層判別分析理論與方法擴張及其於腫瘤診斷上的應用

Theories and Enhancement of Multi-layer Classifiers with Applications to Cancer Diagnosis

指導教授 : 陳正剛

摘要


一般認為CART傳統分類樹可以有效率地分類某些特定資料類型,實際上因其演算法的計算方式並非一直能有效率地做分類。多層判別分析的結構有別於傳統分類樹結構,其每一層必定只有三個結點,其中兩結點為已做判別的資料,一結點為未分類資料,再由這一個未分類結點資料繼續利用其他屬性分割展開新的一層。雖然多層判別分析改善了一些CART的缺點,但在某些情況仍有其限制。本研究利用一個簡單的二類別兩屬性資料探討傳統分類樹與多層判別分析之理論分析能力,發現到這兩種判別方法不一定都會產生最有效率最易解釋的模型出來。我們針對一特定資料類別分佈型態,分別探討兩種分類法的性質,同時也提出一些假設的例子來展示兩種分類法的特性與不足之處。 根據本研究之理論探討發現,多層判別分析不足之處,恰好是傳統分類樹表現最佳之處。因此本研究試著延伸多層判別分析的方法,提出的新的演算法將傳統分類樹的概念加在多層判別分析上。每當一個結點進入演算法時,皆有可能分割成傳統分類樹的兩結點或多層判別分析的兩結點或三結點,且同一層中的結點皆有可能繼續分割。由於分割成三結點的不純度大多情況下都低於分割成兩結點的不純度,因此若要分割成三結點時,必須通過統計檢定檢驗其是否值得分割成三結點。經模擬案例與實際案例的問題來做測試,驗證此新演算法確實提升了多層判別分析的分類能力。

並列摘要


It’s generally believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution. In fact, because the criterion and the procedure used by the traditional classification tree, we can show that it is not always as efficient as expected. The structure of the multi-layer classifier is different from the traditional tree structure. Each layer consists of three nodes of which two are nodes with data completely classified and one with unclassified dat. The node with unclassified data is then further split into a new layer with three nodes until a stop criterion is reached. In this research, we will use a simple binary data with two attributes to discuss the properties and discriminating capabilities of the traditional classification tree and the multi-layer classifier. We show that neither the traditional classification tree nor the multi-layer classifier can always construct the most effective and easily interpretable models for a certain data distribution type. Based on the theoretical discussions in this research, it is found that the insufficiency of the multi-layer classifier can remedied by incorporating the splitting methods of the traditional classification tree. Therefore, we propose an enhanced algorithm for the multi-layer classifier. The new algorithm will embed the concept of the binary classification tree. When a node is to be split, it can be split into two or three nodes. As the impurity of three nodes is more likely to be lower than that of two nodes, to split into three nodes must be tested by a statistical test. This algorithm is tested by simulation cases and a real case to verify and show its superior discriminating capability over the traditional classification tree and the multi-layer classifier.

參考文獻


劉中維,2009,甲狀腺腫瘤超音波特徵之量化與效力分析,國立台灣大學工業工程學研究所碩士論文。
巫信融,2009,多層判別分析及其應用,國立台灣大學工業工程學研究所碩士論文。
楊惟婷,2009,多變量分類樹之建構與應用,國立台灣大學工業工程學研究所碩士論文。
Altman, D. and J. Bland (1994). "Diagnostic tests. 1: Sensitivity and specificity." BMJ: British Medical Journal 308(6943): 1552.
Friedman, J., T. Hastie, et al. (2008). "The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer."

延伸閱讀