本研究旨在以機器學習方法來找出逼近Google搜尋引擎排名的可操作性排序因素以及其權重。所謂可操作性,指的是網站擁有者或者網路行銷業者可以據以來做搜尋引擎最佳化 (Search Engine Optimization, SEO),亦即適度調整網頁的內部或外部品質,以便在特定關鍵字的搜尋結果中獲得排名的提昇。我們關心的是那些可以從搜尋引擎提供的管理者工具或者客觀的第三方取得公開數據的排序因素,而非所有可能的排序因素。本研究以四類工業產品的關鍵詞 (query) ,蒐集 Google 搜尋結果前20筆網頁,且以不同排序因素分成三個階段進行實驗: (1) 外部連結與PageRank之間的關聯、 (2) Authority與PageRank之間的關聯、 (3) 綜合實驗。本研究實驗結果顯示在不同關鍵詞與多種因素組合下計算出的權重值,一致地呈現 PageRank 的權重值遠比其他因素來得高,增加外部連結或Authority等因素對排名預測結果的影響很少。
The study aims to approximate Google’s ranking factors and their weights by a genetic algorithm based method. The factors we are interested in are those whose data are publicly available from webmasters tools provided by search engines or other third-party providers, rather than all possible ranking factors. We collect the top 20 results from Google search results and divided three parts into ranking factors for four categories of industrial products' keywords as our dataset. Three experiments were conducted to find the : (1) Correlation between the External links and PageRank ; (2) Correlation between the Authority and PageRank ; (3) the weights of all factors considered. Experimental results indicated that, in all combinations of factors, PageRank consistently dominates the search results ranking in our experiment and adding other factors such as number of links and authority had little effect on the precision improvement of the new ranking results.