近年來,由於房價所得比居高不下,房價相關之問題為一般民眾及政府所極度重視之議題。在政府推行不動產交易實價登錄制度並陸續公佈開放資料之情況下,房地交易相關資料被大量揭露,對於了解影響房價之主要房屋特性及周邊設施影響因子這個重要議題,提供了前所未有的大量資訊。然而,如何在這種以分散在各處之資料庫為來源,具有資料量大、資料型態複雜、涵蓋情況多元、及極端值隨處等特性下,探討此議題,為隨之而來的重要挑戰。本研究以在許多應用有絶佳表現的隨機森林法,來做為探討影響房價因子之模型。經實證分析發現,隨機森林模型在房價預估方面,相較於比較之方法,於所有採用之指標上,都有最佳的表現;說明了隨機森林在此類房價分析之的優越性。而利用隨機森林模型之解釋能力,本研究最終由廣泛收集的45個自變數中,挑出了最重要的15個影響房的因子,並說明了這些因子與房價之關係,而且本研究首先發現實價登錄的「備註」欄位,是僅次於行政區外,第二重要的變數。
In recent years, due to the high house price-to-income ratio, housing price related problems are important issues for both government and civilians. A series of relative measures were adopted by Taiwan government, including the actual registration policy of real estate transactions. Together with the open data trend, this is the first time in Taiwan such huge amount of real estate related information is publicly available. This provides a good opportunity for exploring the effect of environmental factors and residential characteristics on housing prices. However, the properties of data from various databases are very complicated. This raises a big challenge for researches in the phase of data analysis. In this paper, a state-of-art model, random forest, was employed for exploring the main factors of housing price. According to our experiment results, random forest is the best model for predicting the housing price. This revealed that random forest is an ideal model for analyzing housing price. In addition, according to the associated variable importance measure, 15 important factors were identified among the broadly collected 45 predictor variables. The relationships of housing price and theses 15 factors were also addressed. Moreover, this paper is the first research reveals that the importance of remark field in the actual price registration data base is surpassed only by the region field.