意見問答系統中問題分析與答案區段選取之研究

自動問答技術是自然語言處理領域中一個非常熱門的研究方向。使用者以自然語言輸入問題，自動問答系統綜合運用各種自然語言處理技術，來迅速且有效率地擷取相關答案以提供給使用者。使用者不單只對詢問事實有興趣，也會對詢問意見有興趣。本論文設計並實現一個意見問答系統，以供回答有關人們意見，感覺，或想法的問句。自動問答系統一般包括三個主要組成部分：問句分析、答案區段選取和答案抽取。我們的研究著重前兩部分，也就是問句分析和答案區段選取。問句分析包含三個主要工作：問句類型，問句焦點，以及問句意見傾向的判定。我們定義六種意見問句類型，並提出雙層的問句分類器。第一層分類器分辨問句是問事實，亦或是意見。倘若是意見，第二層分類器進一步將此問句分類到定義的六種意見問句。這兩個分類器的效能分別可以達到87.8%和92.5%的Ｆ值。接著我們討論判斷問句焦點和問句意見傾向的方法。前者判斷出的焦點用來提交到資訊檢索系統中，以找出與問句相關的句子作為可能的答案句。後者判斷問句的意見傾向，並與可能的答案句比對，來保留具有相同意見傾向的答案句。答案區段選取亦包含三個主要工作。在資訊檢索系統提供的相關句中，我們會判斷焦點（焦點領域識別）是否在意見範圍內（意見領域識別），如果是，再進一步判斷此範圍的意見傾向是否和問句相同（意見傾向偵測）。實驗包含了十八種不同的組合。其中最好的模型利用部分焦點比對，在意見範圍層次達到40.59%的Ｆ值。倘若去除相關性的影響，最好的模型之Ｆ值可以提升到87.18%。我們進一步將問句按照其主題分類並實驗，其結果顯示出不同主題的問句，有著不同的困難度。最後我們總結目前的實驗結果，並提出一些相當有趣的議題，以供未來研究，進而實現一個完整的意見問答系統。

關鍵字

opinion question answering system ； question answering ； question analysis ； answer passages retrieval ； opinion extraction

並列摘要

Question answering (QA) systems provide an elegant way for people to access an underlying knowledge base. Humans are not only interested in factual questions but also interested in opinions. In this thesis, an opinion QA system dealing with opinion questions are proposed. We attempt to investigate technologies of question analysis and answer passages retrieval. For question analysis, six opinion question types are defined. A two-layered framework utilizing two question type classifiers is proposed. Algorithms for these two classifiers are discussed. The performance achieves 87.8% in general question classification and 92.5% in opinion question classification. The question’s focus and polarity are detected as well to form an IR query and sieve out relevant sentences which have the same polarity to the question. For answer passages retrieval, three components are introduced. Relevant sentences retrieved by the IR system are further identified whether the focus (Focus Scope Identification) is in a scope of opinion text spans (Opinion Scope Identification) or not, and if yes, whether the polarity of the scope matches with the polarity of the question (Polarity Detection). A total of 18 combinations are proposed and experimented. The best model achieves 40.59% of F-measure using partial match at boundary level. With relevance issues removed, the F-measure of the best model boosts up to 87.18%. We further divide the experiment results by topics. The results indicate difficulties of different topics. We conclude with some yet unsolvable but quite interesting problems to study in the future to build a complete opinion QA system.

並列關鍵字

意見問答系統；自動問答系統；問題分析；答案區段選取；意見擷取

參考文獻

Tzeng, Y.-C. (2005). “A Study on Multilingual Question Answering System,” Master Thesis, National Taiwan University, Taiwan.

de Hoon, M.J.L., Imoto, S., Nolan, J. and Miyano, S. (2004) "Open Source ClusteringSoftware", Bioinformatics 20(9), pp. 1453-1454

Lin, C.-J. (2004). A Study on Chinese Open-Domain Question Answering Systems, Ph.D. Thesis, National Taiwan University

Pang, B., Lee, L. and Vaithyanathan, S. (2002). “Thumbs up? Sentiment Classification Using Machine Learning Techniques,” Proceedings of the 2002 Conference on EMNLP, pp. 79-86.

Riloff, E. and Wiebe, J. (2003). “Learning Extraction Patterns for Subjective Expressions,” Proceedings of the 2003 Conference on EMNLP, pp. 105-112.

國際替代計量

意見問答系統中問題分析與答案區段選取之研究

全文下載

主題瀏覽