以查詢改寫提供衍生詞彙之機制

由於搜尋引擎的普遍使用與網際網路的快速發展，使得網站經營者透過不同的技術以提高自己的網頁在搜尋引擎中的排名；其中關鍵字技術由於實作容易因此較常被採用。現今出現許多提供關鍵字的軟體、網站，但是其僅能提供包含查詢關鍵字的衍生詞彙，且有些軟體因侷限於詞庫，而無法提供較新穎的詞彙。因應上述問題，本研究試圖設計一套系統，於前端使用者部分提供網址及關鍵字輸入兩種模式，而後端衍生詞彙之機制則透過網際網路龐大的資料量，以Pseudo Relevance Feedback方式為基礎，並結合Google API回傳前N筆具有相關性的網頁，根據修正後的Entropy Weighting公式進行詞彙權重的計算與分析，直到滿足所回傳的衍生詞彙之數量為止。最後經由實驗取得各項參數值，針對不同的查詢類別進行準確度的實驗分析，比較不同軟體間的重疊度與提出相關網頁個數評估法進行詞彙之相關性的評估。

關鍵字

查詢改寫；查詢擴展；衍生關鍵字；相關詞彙； Pseudo Relevance Feedback

並列摘要

As more people are using search engines to find what they need from the Internet, businesses must keep their search engine ranking high to remain competitive. One commonly used and easily deployed way to keep a webpage’s ranking high is to put the right keywords in the page. Determining a right set of keywords, however, is not a trivial problem. Currently, there are many software products and websites that can base on an initial keyword set to derive an expanded keyword set, but the variety of keywords in the expanded keyword set is quite limited in that an expanded keyword must always contain some initial keyword, and that keywords are retrieved only from a fixed database of terms and phrases.To increase the variety of expanded keywords, this research proposes a method to expand keywords without the limitations mentioned above. The initial set of keywords can be explicitly specified or via a webpage, from which an initial set of keywords will be derived. Then, an expanded set of keywords is built by first querying Google to retrieve the top n relevant pages, and then using Pseudo Relevance Feedback and a modified Entropy Weighting formula to analyze the weighting of phrases and terms. We experiment this method on several different categories of initial keywords to fine tune the appropriate threshold values. Finally, we study the overlap of the expanded keywords generated by various software products, and propose a method to evaluate the relevancy of keywords based on the number of web pages returned by Google.

並列關鍵字

Query Rewrite ； Query Expansion ； Derived Keyword ； Relevant Terms ； Pseudo Relevance Feedback

參考文獻

[3]G. Salton and C. Buckley，“Term Weighting Approaches in Automatic Text Retrieval”，Information Processing and Management：an International Journal，1988，vol. 24(5)，pp. 513-523.

[4]H. Paijmans，”Comparing the Document Representation of Two IR Systems： CLARIT and TOPIC”，Journal of American Society for Information Science，1993，vol. 44(7)，pp. 383-392.

[5]J. L. Fagan，”The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval”，Journal of American Society for Information Science，1989，vol. 40(2)，pp. 115-132.

[8]L. P. Jones，E. W. Gassie and S. Radhakrishnan，”INDEX：The Statistical Basis for an Automatic Conceptual Phrase-indexing System”，Journal of American Society for Information Science，1990，vol. 41(2)，pp. 87-98.

[11]Susan Dumais，”Improving the retrieval of information from external sources”， Behavior Research Methods，Instruments and Computers，1991，vol. 23，no. 2， pp. 229-236.

國際替代計量

以查詢改寫提供衍生詞彙之機制

主題瀏覽