基於搜尋引擎結果之字詞分析系統

隨著網路的普及化，藉由搜尋引擎來找資料已經廣受歡迎，可是當使用者利用搜尋引擎查詢字詞時，往往會發現到有過多相似的字詞，造成無法判別其正確性。如果能藉由搜尋引擎所回傳的龐大資料中，列舉出現次數較多的、相似度較高的，或許可很快找到比較多人使用、正確性較高的字詞。另外，某些相似字詞是藉由文字之間的重組所產生的。因此，本論文實作了一個系統，即時將使用者所想查詢的中文相關字詞找出來，並將這些相關字詞統計後，以頻率、相似度為主排序，如果該查詢字詞是唐詩三百首中的句子或是成語，也會顯示出處。最後，系統按照使用者所指定的排序將結果呈現出來。

關鍵字

相關字詞；搜尋引擎；文字探勘；唐詩三百首；成語

並列摘要

Because of the development of the network, using search engines that search for data has become popular in this modern society. People using the search engine often find a lot of similar terms. This will cause some difficulties in determining the accuracy of terms. If we can find out the most frequent and similar terms from the results of the search engine, maybe those terms will help the user identify the most accurate terms. In addition, some similar terms are caused by the reorganization among the characters. Therefore, we propose a term analyzer for listing top-ranking terms sorted by their frequency or similarity. If the terms are one of the 300 Tang poetries or Chinese idiom, the system will also show the source. Finally, it shows the results according to the criteria specified by the user.

並列關鍵字

relevance-terms ； search engine ； text retrieval ； 300 Tang poems ； Chinese idiom

參考文獻

[7] Chia-Hui Chang, Shao-Chen Lui. IEPAD:Information Extraction Based on Pattern Discovery. In Proceedings of the 10th International Conference on World Wide Web (WWW10), pp. 595-609, Hong Kong, May 2001.

[8] D.Sculley, Gabriel M. Wachman, and Carla E. Brodley. Spam Filtering using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers. TREC 2006.

[5] Lee-Feng Chien. PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval. ACM SIGIR 1997.

[1] 吳詠裕, 中英雙語語料庫句子排序問題之研究, 6月, 2003年

Google Scholar

[2] 蔡銘峰, 句子相關性和新穎性偵測之研究,6月,1994年

Google Scholar

國際替代計量

基於搜尋引擎結果之字詞分析系統

全文下載

主題瀏覽