導入自適應帶寬核心密度估計以建構機率估計樹

在眾多機器學習方法中，決策樹(decision tree)的流程圖有別於許多機器學習方法為黑盒子模型(black-box model)，擁有高解讀性的優點。然而在決策樹中僅能提供該組別粗略的機率估計，其中以頻率估計法及Laplace機率估計法最廣為人知，但這兩種機率的估計方法僅能提供每個節點內的測試資料(testing data)有相同的機率估計，無法得知節點內每個人的差異化機率估計，機率估計樹(probability estimation tree)，提供節點內每個人擁有差異化的機率估計值。本篇實驗中嘗試了六種不同的模擬資料集，導入自適應核心密度(adaptive kernel density estimation)估計建構機率估計樹，相較於固定帶寬的核心密度估計，在兩群體間分類模糊地帶間，最佳的改善情況為誤差下降了約31%。在應用上，機率估計樹的方法在公共衛生領域偵測登革熱(Dengue fever)上，可以幫助醫生在更短的時間內了解病人被預測為登革熱的狀況。

關鍵字

核心密度估計；解釋性；機率估計樹；登革熱；決策樹

並列摘要

Apart from many machine learning methods, the decision rule are black-box models. Decision tree’s flow chart has the advantage of high interpretation While, decision tree produces poor class probability estimation. Among the methods of probability estimation, the frequency estimation method and the Laplace probability estimation method are the most widely known, but these two methods can only provide the same probability estimation for the testing data in each node. Probability estimation tree provides differentiate probability for every individual person in one node. In our study, combining decision tree and adaptive kernel density estimation, at the fuzzy zone between the two groups ,among six different simulated data sets the best improvement in probability estimation against fixed kernel density estimation is 31% error reduction approximately. In application, machine learning methods in dengue fever detection, can support doctors to grasp the situation of patients predicted to be dengue in a shorter period of time.

並列關鍵字

kernel density estimation ； interpretable ； probability estimation tree ； dengue fever ； decision tree

參考文獻

[1] Abramson, I.S. (1982). On bandwidth variation in kernel estimates. A square root law. The Annals of Statistics, vol. 10, pp. 1217-1223.

Google Scholar

[2] Arun K.P.M., Chitra D.B., Karthick P., Ganesan M., Madhan A.S. (2017). Dengue disease prediction using decision tree and support vector machine. SSRG International Journal of Computer Science and Engineering,(Special Issue), pp. 60-63

Google Scholar

[3] Bradford J.P., Kunz C., Kohavi R., Brunk C., and Brodley C.E. (1998). Pruning decision trees with misclassication costs. In Proceedings of the European Conference on Machine Learn, pp. 131-136.

Google Scholar

[4] Bradley A.P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. vol. 30, no. 7, pp. 1145–1159.

Google Scholar

[5] Breiman, Leo; Friedman, J. H., Olshen, R. A., Stone, C. J. (1984). Classification and regression trees. Belmont, C.A., Wadsworth

Google Scholar

國際替代計量

導入自適應帶寬核心密度估計以建構機率估計樹

未授權

主題瀏覽