  • 學位論文


Predicting the Best Answers in Community-driven Question and Answering Websites

指導教授 : 楊錦生


在現今網路資訊爆炸的時代,透過搜尋引擎所搜尋出的大量資訊往往需要使用者再透過一番篩選才能找出所需資訊。而後Web 2.0時代,社群問答網站(Community-drivene Question Answering, CQA)出現,協助使用者以更快速方便的方式取得所需資訊。但隨著社群問答網站的迅速發展,資訊品質也從專業到浮濫急劇的變化,使用者在面臨到眾多品質參差不齊的資訊時,不知如何選擇。因此,本研究找出社群網站中資訊內容的特徵變數,透過資料探勘技術分析這些資訊內容,探討變數間關係,希望能建立一個自動預測社群問答網站最佳解答的方法。 實驗結果中,本研究所探討的使用者、答案、問題三類變數中,答案類加上問題與使用者類變數對最佳解答的預測效果最有顯著。單一分類器與多分類的效果比較顯示在消除資料缺失因素後多分類器能有效提升預測效果。本研究所建立的四個模型中以[答案 + 使用者+問題]加上分類回歸樹(Simple CART)有最佳的預測效果,其正確率達73.98%。


With the rapid explosion of the Internet technology and related applications, the volume of information available online grows dramatically. The situation we face changed. As John Naisbitt said, “We are drowning in information but starved for knowledge.” Thus, the development of effective and efficient knowledge discovery techniques to retrieve useful knowledge from huge dataset becomes an essential issue. The maturation of Web 2.0 applications has provided us opportunities and tools to extend existing knowledge discovery techniques. Community-driven question answering (CQA) website is a typical example which helps users to obtain useful information (i.e., answer to a specific question) in a more rapid and convenient way. However, the quality of answers varies extensively from professional to dilettante and brings another problem of determining the best answer from various candidates. Therefore, this study focuses on establishing an automatic method to predict the best answers in CQA environment. We extract comprehensive set of features and classify them into three categories, namely answer-related (A), question-related (Q), and user-related (U) variables. On the basis these three types of features, we proposed four prediction models, i.e., A, AQ, AU, and AQU. Moreover, several state-of-the-art single-classifier and multi-classifier induction techniques are examined to evaluate their performance on the best answer prediction issue in CQA websites. A dataset collected from Yahoo! Knowledge+ is employed for evaluation purpose. Some interesting and promising results are obtained from our empirical evaluation.


8. Chooi-Ling Goh, Masayuki Asahara and Yuji Matsumoto,(2005). "Chinese Word Segmentation by Classification of Characters," Computational Linguistics and Chinese Language Processing, Vol.10, No.3, pp.381-396
4. 戈立秀(2007),「部落客之資訊蒐集與分享行為之研究」,國立臺灣大學:圖書資訊學研究所碩士論文。
8. 吳欣儒(2011),「資料探勘技術於病人疼痛自控裝置之應用與分析」,國立交通大學:資訊科學系碩士論文。
3. L. Breiman, (1996). “Bagging predictors,” Machine Learning, Vol. 24, No. 2, pp. 123-140.
4. L. Breiman, (2001). “Random forests,” Machine Learning, Vol. 45, No. 1, pp. 5-32.


