運用語意相關詞來推估Google搜尋引擎的排名

本研究旨在利用和關鍵詞的語意相關詞的線性組合是否逼近Google搜尋引擎排名。本研究著重的重點為網頁的隱含語意，以及關鍵字在網頁標題、網頁片段以及網址所出現的方式，而非所有的排名因素。我們將Google的搜尋結果網頁擷取出網頁標題、網頁片段以及網址，並進行n-gram斷詞，然後使用潛在語意分析 (Latent Semantic Analysis) 與Latent Dirichlet Allocation兩種方法來找出網頁中與查詢關鍵詞有語意相關的詞，並且計算關鍵字在搜尋結果網頁標題、網頁片段以及網址的權重，並將這三種線性組合成一個搜尋結果網頁的分數。我們以語意相關詞數量、網頁文件數量、uni-gram與n-gram語意相關詞以及一個主題與兩個主題的語意相關詞所組成的八組參數組合來進行實驗。實驗結果顯示，語意相關詞的數量為20個以及網頁文件數量為20筆的排序結果最好，在所有參數組合中結果最好的R-Precision可以到達0.8，顯示本研究的方法產生的新排序結果相當接近Google的原始排序結果。

關鍵字

搜尋引擎優化；搜尋引擎排名因素；語意相關詞；網頁搜尋；潛在語意分析；隱含狄氏配置

並列摘要

This study aims to approximate Google ranking results using semantically related terms of query. Firstly, we crawled and extracted web page title, snippet and URL from Google search results. Then we found semantically related terms using Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) two approaches. Secondly we calculated the scores for keywords in title, keyword in snippet and keyword in URL for obtaining a document score. Several experiments were conducted on different combination of number of semantically related terms, number of documents, uni-gram and n-gram tokenization method, 1 topic and 2 topics of semantically related terms. The experimental results showed the average R-Precision reaches 0.8, indicating the ranking results of the proposed method approximates to Google results.

並列關鍵字

Web Page Search ； Latent Semantic Analysis ； Latent Dirichlet Allocation ； Search Engine Optimization ； Ranking Factors ； Semantically Related Terms

參考文獻

[1] Biro, I., Benczur, A., Szabo, J. and Maguitman, A. 2008.A Comparative Analysis of Latent Variable Models for Web Page Classification. Latin American Web Coference. pp. 23-28.

[7] Evans, M. P. 2007. Analysing Google rankings through search engine optimization data. Internet Research, (17:1), pp. 21-37.

[8] Foltz, P. W. 1996. Latent Semantic Analysis for Text-based Research. Behavior Research Methods, Instruments, & Computers. (28:2), pp.197-202.

[13] Kules, B., and Shneiderman, B. 2008. Users can change their web search tactics: Design guidelines for categorized overviews. Information Processing & Management. (44:2), pp.463-484.

[14] Manning, C. D., Prabhakar, R., and Hinrich, S. 2009. An Introduction to Information Retrieval. Cambridge, England: Cambridge University Press.

被引用紀錄

李瑞萍（2010）。健康促進醫院認證對護理人員健康相關指標及工作績效之影響〔碩士論文，長榮大學〕。華藝線上圖書館。https://doi.org/10.6833/CJCU.2010.00007

蘇春秀（2009）。影響臨床護理人員營養攝取行為及其相關因素探討〔碩士論文，長榮大學〕。華藝線上圖書館。https://doi.org/10.6833/CJCU.2009.00175

林建璋（2007）。醫院員工之健康促進活動認知及參與程度相關之探討--從健康促進醫院的觀點〔碩士論文，臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2007.00121

李煥鈞（2004）。台中縣居民吸菸、飲酒、嚼檳榔行為與其它健康行為之相關性及影響因素分析〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916275029

吳采蓉（2005）。白領階級員工健康促進生活型態相關因素之探討-以台北某公司為例〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-2004200714565389

國際替代計量

運用語意相關詞來推估Google搜尋引擎的排名

全文下載

主題瀏覽