透過您的圖書館登入
IP:3.145.156.204
  • 學位論文

導入自適應帶寬核心密度估計以建構機率估計樹

Combining decision tree and adaptive kernel density estimation to construct probability estimation tree

指導教授 : 歐陽彥正
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在眾多機器學習方法中,決策樹(decision tree)的流程圖有別於許多機器學習方法為黑盒子模型(black-box model),擁有高解讀性的優點。然而在決策樹中僅能提供該組別粗略的機率估計,其中以頻率估計法及Laplace機率估計法最廣為人知,但這兩種機率的估計方法僅能提供每個節點內的測試資料(testing data)有相同的機率估計,無法得知節點內每個人的差異化機率估計,機率估計樹(probability estimation tree),提供節點內每個人擁有差異化的機率估計值。本篇實驗中嘗試了六種不同的模擬資料集,導入自適應核心密度(adaptive kernel density estimation)估計建構機率估計樹,相較於固定帶寬的核心密度估計,在兩群體間分類模糊地帶間,最佳的改善情況為誤差下降了約31%。在應用上,機率估計樹的方法在公共衛生領域偵測登革熱(Dengue fever)上,可以幫助醫生在更短的時間內了解病人被預測為登革熱的狀況。

並列摘要


Apart from many machine learning methods, the decision rule are black-box models. Decision tree’s flow chart has the advantage of high interpretation While, decision tree produces poor class probability estimation. Among the methods of probability estimation, the frequency estimation method and the Laplace probability estimation method are the most widely known, but these two methods can only provide the same probability estimation for the testing data in each node. Probability estimation tree provides differentiate probability for every individual person in one node. In our study, combining decision tree and adaptive kernel density estimation, at the fuzzy zone between the two groups ,among six different simulated data sets the best improvement in probability estimation against fixed kernel density estimation is 31% error reduction approximately. In application, machine learning methods in dengue fever detection, can support doctors to grasp the situation of patients predicted to be dengue in a shorter period of time.

參考文獻


[1] Abramson, I.S. (1982). On bandwidth variation in kernel estimates. A square root law. The Annals of Statistics, vol. 10, pp. 1217-1223.
[2] Arun K.P.M., Chitra D.B., Karthick P., Ganesan M., Madhan A.S. (2017). Dengue disease prediction using decision tree and support vector machine. SSRG International Journal of Computer Science and Engineering,(Special Issue), pp. 60-63
[3] Bradford J.P., Kunz C., Kohavi R., Brunk C., and Brodley C.E. (1998). Pruning decision trees with misclassication costs. In Proceedings of the European Conference on Machine Learn, pp. 131-136.
[4] Bradley A.P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. vol. 30, no. 7, pp. 1145–1159.
[5] Breiman, Leo; Friedman, J. H., Olshen, R. A., Stone, C. J. (1984). Classification and regression trees. Belmont, C.A., Wadsworth

延伸閱讀