從各種相關研究與政府指標顯示,房地產市場發展會受到房地產本身屬性、周遭環境、政府決策,乃至全球經濟發展等多種因素的影響。本文運用文字探勘方法,蒐集民國94年1月至100年3月共4762筆房地產討論社群資料,經過斷詞處理,特徵詞選取,萃取網站討論者之情緒,量化轉換成情緒分數,透過類神經網路(Neural Network)、基因規劃法(Genetic Programming)、支援向量迴歸(Support Vector Regression)及分類迴歸樹 (Classification and Regression Tree, CART)分別建立模型,再與內政部營建署所提供之房地產買賣移轉棟數比較,進行推估。 並使用MAPE(Mean Absolute Percentage Error)評比各演算法計算結果,其中以支援向量迴歸對測試資料所求得之曲線適配(Curve Fitting)效果最好。此外,將各月之情緒分數分別向前平移一至六個月進行比較,結果顯示以領先買賣移轉數2至3個月的網路情緒與買賣移轉件數之關係最為接近。後續並以此為基礎,透過比較哪些特徵詞出現的量較大,以及超過季平均次數,分別觀察交易量向上反彈時之特徵詞,交易量持平時點之特徵詞,以及交易量向下反轉時點之特徵詞,以提供參考。
According to the research and government’s statistics, the real estate market will be affected by the factors like it’s own property, all around environment, government’s policies and the global economics development, and so on. In this research, we use text mining method to collect total 4,762 discussion records about real estate forums from January 2005 to March 2011. After word segmentation process, feature selection, extracting forum users’ sentiment and converting it into sentiment scores, then compared with the real estate trading volume provided by the Construction and Planning Agency Ministry of the Interior, building models by Neural Network、Genetic Programming、Support Vector Regression and Classification and Regression Tree, accordingly and carrying on prediction. Using MAPE (Mean Absolute Percentage Error) to measure each algorithm’s calculated results, we can get the best curve fitting effect out of the Support Vector Regression to the data testing. Besides, to compare the sentiment scores with one to six months earlier, the result points out that the two months or three months earlier sentiment scores is closest to the trading volume. At the basis, by comparing the amount of feature words, more than a quarter of the average number, to observe the feature words at that time the trading volume changed or remained almost the same.