本研究探討類別相關詞對搜尋引擎的搜尋結果排名之影響。本研究的流程分為資料前處理、相關詞選取、詞彙權重矩陣和排名分數計算四個階段。首先,我們抓取一般網站的網頁資料,進行網頁剖析,再將資料透過 CKIP 和混合斷詞系統來斷詞;接著我們計算每個 Term 的 TFICF 值,再透過門檻設定,挑選出各類的相關詞,並且使用 SVM 分類器來驗證相關詞的可用性;最後,我們建立各類權重值矩陣,計算相關詞與相關詞間的相關係數值,並利用相關係數值來做搜尋結果排名分數的計算。 本研究的實驗結果,SVM 驗證部份,Precision、Recall 與 F1-measure 皆在0.87 以上,顯示挑選出來的詞與各類別是相關的。在排名預測部份,Precision 為0.7-0.9 之間,顯示相關詞對搜尋結果排名具有一定的影響力。
This study investigates the effect of topic relevant terms on search engine results ranking. This system flow is divided into four steps: data pre-processing, relevant terms selection, term weight matrix construction and rank score calculation. Firstly, we obtained data by crawling and parsing data from web sites, then using CKIP and Hybrid Chinese Segmentation System for word segmentation. Secondly, we calculated the TFICF value for each term, and used a threshold to select relevant terms for each category. Then a SVM classifier was used for verification of the usability of the found terms. Finally, we created a weight value matrix for each category and calculated correlation coefficient value between relevant terms. Correlation coefficients between terms and query were used to determine a document score with respect to a given query. The experimental results showed that the SVM-verified Precision, Recall, F1-measure are above 0.87, indicating the terms we selected are relevant to their corresponding categories. As for the ranking prediction, the R-precision was between 0.7-0.9, indicated that the topic relevant terms have impacts on search results ranking.