兩個專有詞彙關聯句自動擷取之研究

本論文之研究目的是針對可信文字資料來源，根據使用者所輸入的兩個專有詞彙，依照詞彙不同的關係，由資料來源中自動找出關聯句組或是關聯句，幫助使用者比較兩個專有詞彙概念。我們將詞彙關係分成兩大類：包含關係和非包含關係。本系統利用網路搜尋引擎分別搜尋兩個查詢詞彙，蒐集包含個別查詢詞彙的前K名網頁摘要，統計兩個查詢詞彙在彼此網頁摘要中出現的機率作為特徵，依據詞彙關係分類模型進行自動分類。兩個查詢詞彙若被分類為”包含”關係，系統會取出同時包含兩個查詢詞彙之句子作為關聯句集，比對關聯句型規則模型，並計算與查詢詞彙之語意關聯度，選出關聯分數最高的句子當作關聯句。查詢詞彙若被分類為 ”非包含” 關係，系統則取出包含任一查詢詞彙的句子作為關聯句集，從中找出對兩個查詢詞彙有高度關聯的共同概念詞，將句子依照共同概念詞進行分群，評估句子與共同概念詞以及句子間兩兩配對的語意相關分數，挑選分數最高的兩個句子形成關聯句組。實驗結果顯示本研究所提出的方法能有效對查詢字組的關係自動分類；考慮句型和語意關聯度分數找出的關聯句有助於使用者了解查詢詞彙的關聯性；而利用句組分數篩選出的關聯句組亦大多可以幫助使用者釐清兩個查詢詞彙在某些概念上相同相異的比較。

關鍵字

專有詞彙；問題分類；句型樣式；語意關聯度；關聯句；關聯句組

並列摘要

According to different relationships between two domain-specific query terms, this thesis studies the strategies of automatically extracting the associated sentences or sentence pairs of the query terms from a reliable text data source. The goal of this task is to help users comparing two domain-specific query terms from the retrieved results. Two categories for the relationships between query terms are defined in this thesis: contained and not-contained relationships. The system uses a search engine on theweb to search the given two query termsforcollecting the top-k snippets for each query term. The probability of a query term appearing in the top-k snippets of the other query term is used as features to train aclassifier of query pair relationship. Ifthe two query terms have the containedrelationship, the sentences containing both terms are retrieved as the candidate sentences.Foreach candidate sentence, itsassociated score is evaluated by matching the lexical pattern withthe associated sentence rule model and computing the semantic relatedness degreewith the query terms. The sentence with the highest associated score is selected as the associated sentence.If the relationship is a not-containedrelationship, the common concept terms, which have high semantic relatedness with both query terms, are extracted from the sentences containingone of the two query terms.We use common concept terms to group sentences.Within each group, the representation scoreof each candidate sentence pair is evaluated by computing its sematic relatedness with the concept terms andthe sematic relatedness sematic similaritybetween the sentence pair. The sentence pairwith the highest representation score isselected as an associated sentence pair.The experimental results show that the proposed methodcan effectively classifythe relationshipsof query terms. Moreover, the retrieved associated sentencesare helpful for usersto understand the semantic relationshipbetween two query terms.The discovered associated sentence pairs also effectively help users to clarify the similar and dissimilar concept between two query terms.

並列關鍵字

domain-specific term ； query classification ； lexical pattern ； relatedness degree ； associated sentence ； associated sentence pair

參考文獻

[3] X. Cao, G. Cong, and B. Cui, “The Use of Categorization Information in Language Models for Question Retrieval,” in Proceedings of the 18th ACM conference on Information and Knowledge Management (CIKM), 2009.

[4] L. Cai, G. Zhou and K. Liu, "Large-Scale Question Classification in cQA by Leveraging Wikipedia Semantic Knowledge", in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011

[6] D. Bollegala, Y. Matsuo, and M. Ishizuka, "Measuring the SimilarityBetween Implicit Semantic Relations Using Web Search Engines", in Proceedings of the Second ACM International Conference on Web Search and Data Mining(WSDM), 2009.

[7] A. Kalyanpur, S. Patwardhan, and B. Boguraev, “Fact-Based Question Decomposition for Candidate Answer Re-Ranking” in Proceedings of the 20th ACM conference on Information and Knowledge Management (CIKM), 2011

[12] S. Szumlanski and F. Gomez, “Automatically Acquiring a Semantic Network of Related Concepts” in Proceedings of the 19th ACM conference on Information and Knowledge Management (CIKM), 2010

國際替代計量

兩個專有詞彙關聯句自動擷取之研究

主題瀏覽