基於語意相關詞的搜尋結果分群

本研究探討如何利用潛在語意分析 (Latent Semantic Analysis) 來找出 Google 搜尋結果的相關詞，並且依照語意分析產生的概念維度來將相關詞做分群，之後再利用每一筆網頁所包含的相關詞來將搜尋結果進行分群。我們分別抓取 Google 搜尋結果前20筆到50筆的網頁來做潛在語意分析，找出與查詢關鍵詞相關的相關詞，再透過一個概念維度篩選機制將相關詞進行分群，並且計算代表性較高的相關詞當作群集的標題，以方便使用者了解各群集的內容。本研究提出的方法會依據概念維度門檻為每一個查詢關鍵詞的搜尋結果判斷出適合的分群群數，而不用事先決定。搜尋結果中的網頁依據其所包含的相關詞在各相關詞群集之間的概念維度分數，分配到一個主要的文件群集或是分配到多個文件群集。最後，本研究使用 Silhouette Coefficient 來評估我們提出的相關詞分群以及文件分群方法的效能，並且與其他分群系統作比較。

關鍵字

搜尋結果分群；文件分群；潛在語意分析；語意相關詞

並列摘要

This study proposed a Latent Semantic Analysis based method to find semantically related terms from Google search results for a given query and to group the terms into clusters. Each item of the search results is then grouped into one individual cluster based on the terms it contains. Top 20 to 50 search results for each query are crawled for LSA analysis. A heuristic method is proposed to conduct clustering of semantically related terms based on their concept dimension significance after LSA analysis. For each cluster, the terms that have high representative values are chosen as the title words. The proposed method determines the best fit number of clusters for each query, without the burden of defining the number of clusters in advance. Web pages containing multiple terms is assigned to a primary cluster or allocated them into multiple clusters based on either coverage or concept dimension significance. Finally, the clustering quality is evaluated using silhouette coefficient on experiment results using a set of mixed popular and industrial keywords. The clustering quality of the proposed method is also compared with carrot2, a popular clustering engine.

並列關鍵字

Search Results Clustering ； Latent Semantic Analysis ； Semantically Related Terms ； Document Clustering

參考文獻

[1] 平震宇，(2007). 『一個適用於行動裝置的網頁搜尋結果分群系統之研究』，元智大學資訊管理研究所碩士論文。

[3] 卡珍貝，(2011). 『運用潛在語意分析來分析Google搜尋結果的排名』，元智大學資訊管理研究所碩士論文。

[2] 陳智威，(2008).『一個中英文搜尋結果即時分群系統之研究』，元智大學資訊管理研究所碩士論文。

[4] Carpineto, C., Pietra, D.A., Mizzaro, S. and Romano, G., (2006). Mobile clustering engine. Advances in Information Retrieval: Proceedings of the 28th European Conference on IR Research, ECIR 2006, LNCS 3936, 155-166. Springer-Verlag Berlin Heidelberg. doi:10.1007/11735106_15.

[5] Carpineto, C., Osinski, S., Romano, G. and Weiss, D., (2009). A Survey of Web Clustering Engines. ACM Computing Surveys, (41:3), 17:1-17:38.

國際替代計量

基於語意相關詞的搜尋結果分群

全文下載

主題瀏覽