透過您的圖書館登入
IP:13.58.112.1
  • 學位論文

知識分享社群中最佳解答之預測

Predicting the Best Answers in Community-driven Question and Answering Websites

指導教授 : 楊錦生

摘要


在現今網路資訊爆炸的時代,透過搜尋引擎所搜尋出的大量資訊往往需要使用者再透過一番篩選才能找出所需資訊。而後Web 2.0時代,社群問答網站(Community-drivene Question Answering, CQA)出現,協助使用者以更快速方便的方式取得所需資訊。但隨著社群問答網站的迅速發展,資訊品質也從專業到浮濫急劇的變化,使用者在面臨到眾多品質參差不齊的資訊時,不知如何選擇。因此,本研究找出社群網站中資訊內容的特徵變數,透過資料探勘技術分析這些資訊內容,探討變數間關係,希望能建立一個自動預測社群問答網站最佳解答的方法。 實驗結果中,本研究所探討的使用者、答案、問題三類變數中,答案類加上問題與使用者類變數對最佳解答的預測效果最有顯著。單一分類器與多分類的效果比較顯示在消除資料缺失因素後多分類器能有效提升預測效果。本研究所建立的四個模型中以[答案 + 使用者+問題]加上分類回歸樹(Simple CART)有最佳的預測效果,其正確率達73.98%。

並列摘要


With the rapid explosion of the Internet technology and related applications, the volume of information available online grows dramatically. The situation we face changed. As John Naisbitt said, “We are drowning in information but starved for knowledge.” Thus, the development of effective and efficient knowledge discovery techniques to retrieve useful knowledge from huge dataset becomes an essential issue. The maturation of Web 2.0 applications has provided us opportunities and tools to extend existing knowledge discovery techniques. Community-driven question answering (CQA) website is a typical example which helps users to obtain useful information (i.e., answer to a specific question) in a more rapid and convenient way. However, the quality of answers varies extensively from professional to dilettante and brings another problem of determining the best answer from various candidates. Therefore, this study focuses on establishing an automatic method to predict the best answers in CQA environment. We extract comprehensive set of features and classify them into three categories, namely answer-related (A), question-related (Q), and user-related (U) variables. On the basis these three types of features, we proposed four prediction models, i.e., A, AQ, AU, and AQU. Moreover, several state-of-the-art single-classifier and multi-classifier induction techniques are examined to evaluate their performance on the best answer prediction issue in CQA websites. A dataset collected from Yahoo! Knowledge+ is employed for evaluation purpose. Some interesting and promising results are obtained from our empirical evaluation.

參考文獻


8. Chooi-Ling Goh, Masayuki Asahara and Yuji Matsumoto,(2005). "Chinese Word Segmentation by Classification of Characters," Computational Linguistics and Chinese Language Processing, Vol.10, No.3, pp.381-396
4. 戈立秀(2007),「部落客之資訊蒐集與分享行為之研究」,國立臺灣大學:圖書資訊學研究所碩士論文。
8. 吳欣儒(2011),「資料探勘技術於病人疼痛自控裝置之應用與分析」,國立交通大學:資訊科學系碩士論文。
3. L. Breiman, (1996). “Bagging predictors,” Machine Learning, Vol. 24, No. 2, pp. 123-140.
4. L. Breiman, (2001). “Random forests,” Machine Learning, Vol. 45, No. 1, pp. 5-32.

被引用紀錄


陳鉞堂(2017)。兩岸空運直航對桃園機場競爭力之影響〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.00334
龍威任(2011)。診斷關聯制度實施對醫療行為之衝擊-以某區域醫院為例〔碩士論文,元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-2801201415000850

延伸閱讀