以潛藏主題分析為基礎的Web查詢詞分類之研究

現今網路搜尋引擎扮演一個相當重要的角色，它幫助人們能有效力地從龐大的網路資料中找出它們所想要的資訊。查詢詞分類是搜尋引擎技術中一個重要的議題。查詢詞分類的任務就是要把查詢詞正確分類到有關的目錄。針對查詢詞分類問題，我們主要面臨兩個困難，第一，大部分的查詢詞字串是簡短與模糊的。第二，許多的查詢詞包含二個以上的使用者意圖。因此本研究提出一個方法，先利用多個搜尋引擎去擴充簡短的查詢詞，之後從這些擴充資訊中擷取出查詢詞可能包含到的多個主題語意，利用潛藏狄式配置從這些資訊當中取出其潛藏的語意。所提出的方法相較於Shen等人在2005年所提出的方法，在精準度上改進了6.5%，而在F1上則改進了6.6%。藉由實驗的證明，我們方法能有效地增進查詢詞分類的效能表現。

關鍵字

查詢詞；分類；查詢詞分類；資訊擷取與檢索

並列摘要

Nowadays Web search engines play an important role in helping people effectively find information from massive Web data. The Web query classification (WQC) problem is a crucial issue in search engine technology. The task of WQC is to classify Web queries into relevant Web categories. For the WQC problem, there are two major difficulties. First, most queries are short and ambiguous. Second, many queries have more than one user intention. Therefore, this research proposes a scheme that exploits multiple search engines to enrich user queries, and then extracts multiple latent topics from the expanded queries.The scheme uses the Latent Dirichlet Allocation (LDA) model to extract the latent topics from the enriched queries for query classification. The experiments show that our approach can improve the performance by 6.5% and 6.6% for precision and F1, respectively in comparison with the schemes proposed by Shen et al. in 2005. The experimental results show that the proposed LDA-based scheme can effectively improve the WQC performance.

並列關鍵字

query ； classification ； query classification ； information retrieval

參考文獻

Web Query Classification Using Labeled and Unlabeled Training Data,” in

and Development in Information Retrieval (SIGIR 2006), Salvador, Brazil, August

Conference on Data Mining (ICDM’05), 2005, pp. 42–49.

X.-J. Yuan, and C. Cool, “Query Length in Interactive Information Retrieval,” in

and Development in Information Retrieval (SIGIR 2003), 2003, pp. 205–212.

被引用紀錄

邱偉哲（2017）。探討影響大學生參與服務學習之因素-以2017臺北世界大學運動會為例〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.00289

國際替代計量

以潛藏主題分析為基礎的Web查詢詞分類之研究

全文下載

主題瀏覽