  • 學位論文


Analyzing Google Search Results through Latent Semantic Analysis

指導教授 : 陸承志


本研究提出了一個以潛在語意分析(Latent Semantic Analysis)為基礎的方法來推估Google搜尋引擎的排名。我們對關鍵字查詢結果的網頁進行潛在語意分析,來評估語意相關詞會對排名造成的影響。我們對搜尋結果網頁進行啟發式n-gram斷詞以擷取出n-grams,並建立詞文矩陣(term-document matrix),來找出文章與詞之間隱含的語意關係。我們使用聚合式分群技術建立概念群組並使用泡泡圖(bubble graph)來呈現。我們由文章與查詢虛擬詞文章的文章-文章相關矩陣來評估文章與查詢詞的相關度。實驗結果顯示使用啟發式n-gram斷詞系統來推估排名,效果比僅使用uni-gram更為出色,而且R-Precision平均值可以達到70%。


This study proposed a Latent Semantic Analysis based method to analyze Google’s ranking. We conducted Latent Semantic Analysis on Google’s search results for a given set of queries to evaluate if latent semantic terms contribute in ranking. We implemented heuristic n-gram extraction tool for extracting n-gram terms from search engine results pages. A term-document matrix was constructed for Latent Semantic Analysis to explore the latent relationship between terms and documents. We used agglomerative clustering to build concept groups and demonstrated them with a bubble graph. To obtain correlation between documents, a document-document correlation matrix with respect to query pseudo document was implemented. Experimental results show that using the heuristic n-gram extraction, the method performed better, as compared to unigrams, and achieved average R-Precision up to 70%.


[1] Albert, B., & Carlos, C. (2010). Academic Search Engine Optimization : Optimizing Scholarly Literature for Google Scholar and Co. Journal of Scholarly Publishing. Beel academi search , 176–190.
[2] Ashok, S. N., & Mehran, S. (2009). Text Mining Classification, clustering and Application. New York, United States of America: CRC Press.
[3] Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern Infromation Retrieval the concepts and technology behind search (2nd Edition ed.). Harlow, England: Addison Wesley.
[4] Banerjee, S., & Pedersen, T. (2003). The Design, Implementation and Use of the Ngram Statistics Package. Proc. of the 4th CICLing.
[5] Beel, J., Gipp, B., & Eilde, E. (2010). Optimizing Scholarly Literature for Google Scholar and Co. Scholarly Publishing, 41 (2), 176–190.


王筱筑(2008)。影響Tw-DRGs病例組合指標相關因素之探索性研究 -以2002 - 2005年健保資料庫為例〔碩士論文,臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2008.00077
