決策樹中移除不相關值問題在醫療研究的運用

隨著醫療資訊系統的廣泛使用使得資料庫中資料量的大量增加。因此我們若能從現有的病歷資料經由數據的分析找出各種病徵在某一特定的病症中的相關性從而歸納出它們相互間的必然性，則可幫助醫生在診斷時給於協助進而提升醫療品質。由於科技的進步、原先由手書寫方式的病歷改由以電腦儲存，近年更是由於軟硬體的進步，使得原先單純文字為主的病歷資料，進一步結合影像以及數位訊號等多媒體的資料型態，而成為多媒體醫學資料庫。無論是從病歷儲存到各種醫學影像或是生理訊號等屬於內含的資訊，藉此醫生更能有效的掌握病人的資料，對於臨床和基礎醫學研究都有相當大的正面意義，同時也可進一步的讓病人能夠得到更佳的醫療品質，基於以上的原因，歐美各國及日本先進國家無不對醫療資訊的整合系統進行廣泛的研究，目前國內外大部分的醫療體系也都建立的專屬的資料庫管理系統，以加速病患、醫師、與醫院間資訊的流通。在資料探勘技術裡，決策樹中不相關值問題將會是本文討論的重點。當使用一組規則來代表一決策樹時，個別規則的先決條件可能含有不相關的狀況。當我們將這些規則應用在醫療檢驗時，這些不相關的狀況可能造成病人與社會不必要的負擔。因此為避免產生含有不相關狀況的規則，我們提出一個新的演算法。根據決策樹上的資訊，在轉換決策樹的過程中移除規則的不相關狀況。我們的演算法不只能處理不連續值，同時也可以處理連續值。

關鍵字

決策樹；不相關值問題；移除分支問題；醫學檢驗

並列摘要

The decision tree is one of the key data mining techniques and has been applied to medical applications. A decision tree is built up by selecting the best test attribute as the root of the decision tree. Then, the same procedure is operated on each branch to induce the remaining levels of the decision tree until all examples in a. leaf belong to the same class. However, since the decision tree creates a branch for each value of that appearing in the training data without considering whether the value is relevant to the classification, the resultant tree may have over-specialization problem. Without losing generality, we only consider ID3-like algorithm in this paper. As pointed out by J. Cheng, the irrelevant values problem and the missing branches problem are two causes of over-specialization of the decision tree. The missing branches problem of the decision tree is due to the fact that some of the reduced subsets at the non-leaf nodes do not necessarily contain examples of every possible value of the branching attribute. Consequently, the decision tree may fail to classify some instances. Since some values of that attribute may not be relevant to the classification, the resultant rules of the decision tree may have irrelevant conditions, which demands extra information to be supplied. Extra information needed means extra examinations needed to a patient, and extra examinations cause more expense and more burdens to the patient and society. When the decision tree is applied to medical applications, to save medical resources and avoid unnecessary examinations, we have to deal with irrelevant conditions in the decision tree. When a decision tree is represented by a collection of rules, the antecedents of individual rules may contain irrelevant conditions. When we apply these rules to medical examinations, these irrelevant conditions may cause unnecessary burden to the patient and the society. Therefore, to avoid generating rules with irrelevant conditions, we propose a new algorithm to remove irrelevant conditions of rules in the process of converting the decision tree to rules according to information on the decision tree. Our algorithm can handle not only discrete values, but also continuous values.

並列關鍵字

Decision tree ； classification ； the irrelevant values problem ； the missing branches problem ； medical examination.

參考文獻

1. Ding-An Chiang, Wei Chen, Yi-Fan Wang, Chen-Fang Hsu “The Irrelevant Valuse Problem in The ID3 Tree” Computers and Artificial Interlligence, Vol. 19, 2000, pp.169-182.

2. Adjeroh and K. C. Nwousu, ”Multimedia database management reqirements and issues? IEEEE Multimedia, July-September, pp.24-33, 1997.

3. msar, J., Zupan, B., Aoki, N., eature Mining And Predictive Model Construction From Severe Trauma Patient data? Ournal of Medical Informatic, Vol. 63, pp.41-50, 2001.

5. Ragavan, L. Rendell, M. Shaw, and A. tessmer A, ”Lookahead feature construction for learning hard concepts? Proc. 10th Intern. Conf. on Machine Learning, pp.252-259, June 1993.

10. R. Quinlan, “Induction of decision tree, “Machine Learning, 1, 1986, pp. 81-106.

被引用紀錄

黃安傑（2011）。以資料探勘方法建立病患門診決策系統〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-1407201111332400

國際替代計量

決策樹中移除不相關值問題在醫療研究的運用

全文下載

主題瀏覽