透過您的圖書館登入
IP:3.143.214.56
  • 學位論文

逐段線性迴歸樹

Sample-Efficient Regression Trees for Attributes with Mixed Continuous and Discrete Effects-A Piecewise-Linear Regression Tree

指導教授 : 陳正剛

摘要


迴歸樹 (Regression Trees) 可以用來處理類別性或是連續性的response,但是迴歸樹在選擇變數 (attributes) 時,持續對資料做分割會造成樣本數迅速減少而造成不可靠的估計。提高樣本使用率之迴歸樹 (Sample-Efficient Regression Trees) 即是用來處理迴歸樹樣本數急速減少的問題。然而,當資料有著continuous effects,variant continuous effects,以及mixed effects時,迴歸樹以及提高樣本使用率之迴歸樹均無法處理這些問題。 在處理continuous effects時,我們結合了逐步迴歸分析以及提高樣本使用率之迴歸樹的方法來解決這個問題。針對variant continuous effects,我們提出了一種同時考量一個變數的continuous effects以及discrete effects的變數選擇方法來處理這個問題。最後,在處理mixed effects時,我們除了考量一個變數單一的影響外,此變數下一層選出來的變數對於整個model的解釋能力的影響也會被考慮。 為了驗證我們提出的方法,我們利用一些模擬所產生的資料以及一個關於體脂肪的實際案例,對於我們提出的方法以及一些其他的方法做比較。經過一些比較分析的結果,證明了新提出的方法可以有效的解決continuous effects ,variant continuous effects,以及mixed effects。

並列摘要


Classification and regression trees (CART) is a type of decision-tree techniques, used to deal with either categorical or continuous response. A shortcoming of the regression tree is that the splitting procedure exhausts the sample size quickly. Sample-Efficient Regression Trees (SERT) is developed to address the sample-size-depleting issue. However, both SERT and CART are only able to select the attributes with discrete effects. The attributes with continuous effects, variant continuous effects, and mixed effects will not be selected into the tree model by CART and SERT. In this research, we integrate the stepwise regression method and sample-efficient regression tree approach to select attributes with continuous effects. When dealing with attributes with variant continuous effects, we propose a method to consider simultaneously the continuous effect and discrete effect of an attribute. For the attributes with mixed effects, we consider not only the effect of attribute but also that of the attributes selected subsequently. In order to validate the methods we proposed, we test the proposed tree using some simulated data with continuous effects, variant effects, and mixed effects. A real case about the body density of 252 men is also studied. With the validation of the simulated data and the real case, we verify that the new decision tree is able to select attributes that other decision trees fail to select and build a more robust tree model with attributes effects more accurately estimated.

參考文獻


[1] J. Han and M. Kamber., Data Mining Concepts and Techniques. Morgan Kaufmann publisher, 2001.
[7] P. D. Allison, “Testing for Interaction in Multiple Regression”, The American Journal of Sociology, vol. 83, no. 1, pp. 144-153.
[9] R. B. Bendel and A. A. Afifi, “Comparison of stopping rules in forward stepwise regression”, Journal of American Statistical Association, vol. 72, pp. 46-53, 1997.
[2] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Monterey, CA: Wadsworth, 1984.
[3] T. R. Ho, “Sample-Efficient Regression Tree for Binary and Ordinal Attributes and Continuous Target”, M.S. Thesis, Graduate Institute of Industrial Engineering, National Taiwan University, National Taiwan University, 2003.

被引用紀錄


Lu, Y. P. (2008). 整合統計分析與知識推論系統的貝氏架構設計 -以半導體良率分析為例 [master's thesis, Yuan Ze University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0009-2307200816125100

延伸閱讀


國際替代計量