透過您的圖書館登入
IP:18.222.108.223
  • 學位論文

具有錯誤發現率和型一誤差控制的可解釋之預測樹模型

A tree-based interpretable predictive method with FDR and type-one error control

指導教授 : 歐陽彥正
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在實際的問題中儘管擁有許多的變數,我們並不曉得哪些變數是 真實的變數,哪些是虛假的雜訊。通過發現重要變數,研究人員可以 進一步利用選擇的重要變數進行更有針對性的後續實驗以利探討背 後的科學現象。一個自然的要求是,我們希望盡可能發現更多的相 關變量,同時盡可能犯更少的錯誤。我們提出一個改良的RuleFit 模 型,其中包含利用knockoff procedure 達到控制錯誤發現率, 以及通過 Neyman-Pearson 方法控制型一誤差。

並列摘要


Despite the abundance of the available variables, ground truth is privy to knowledge about the problem seldom revealed in practice. By discovering important features, researchers can further conduct a more targeted follow-up experiment on the selected features tailored for understanding the scientific phenomenon. A natural requirement is that we wish to discover as many relevant variables as possible and make as few mistakes as possible at the same time. We propose a modified RuleFit with FDR control by knockoff procedure and with alpha control by Neyman-Pearson method.

並列關鍵字

Knockoff FDR Lasso Neyman-Pearson method

參考文獻


[3] Barber, R. F., Candès, E. J., et al. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5):2055–2085.
[4] Beauchamp, N. (2017). Predicting and interpolating state-level polls using twitter textual data. American Journal of Political Science, 61(2):490–503.
[7] Bertsimas, D. and Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7):1–44.
[10] Breiman, L. and Shang, N. (1996). Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report.
[11] Brzyski, D., Peterson, C. B., Sobczyk, P., Candes, E. J., Bogdan, M., and Sabatti, C. (2017). Controlling the rate of gwas false discoveries. Genetics, 205(1):61–75.

延伸閱讀