應用實價登錄建立以聚類方法之堆疊泛化房價預測模型 -以桃園市區分建物房價資料為例

本研究探討結合聚類分析的堆疊泛化模型對台灣房價預測的適用性。利用最新可用的桃園市實價登錄資料, 本研究首先拓展了Trivedi et. al (2015) 的聚類分析集成學習方法，建立了一個聚類分析的兩層堆疊泛化模型。第一層聚類分析群模型分別由Lasso，KNN以及決策樹建立，第二層元模型分別由線性迴歸、隨機森林以及XGBoost所建立。接下來用此拓展的兩層聚類分析堆疊泛化模型預測了桃園市房價資料，並與其他機器學習模型，包括線性迴歸、隨機森林和XGBoost，比較他們的預測結果。

關鍵字

特徵選取；聚類分析；機器學習；集成學習；堆疊泛化；實價登錄；房價預測

並列摘要

This research explores the applicability of combining clustering technique with stacked generalization for Taiwan housing prices prediction. Taking advantage of the most currently available Taoyuan City Actual Price Registration Data, we first expanded the clustering-based ensemble learning method by Trivedi et al. (2015) to develop two-layer clustering-based stacked generalizers. In the first layer, three machine learning methods (Lasso, KNN and Decision Tree) were used to construct the cluster models. In the second layer, Linear Regression, Random Forest and XGBoost were used to build meta models. These developed stacked generalizers are then used to predict housing prices in the Taoyuan City. Their prediction accuracies are then compared with that from other machine learning methods, including Linear Regression, Random Forest and XGBoost.