一個中英文搜尋結果即時分群系統之研究

本研究實做一個適用多種平台的中英文搜尋結果的即時分群系統。為了因應多種平台頻寬與處理能力的不同需求，本研究僅針對搜尋結果的標題、網頁片段和 URL 做分析。針對之前在行動裝置的研究缺失，我們提出下列改良：改良之前研究的中文斷詞方法，建立了一套中、英文常用詞萃取-參考機制；利用語意相關、標題加權等方法建立特徵詞聯合篩選機制；並在分群時加入超連結相似度計算與「最佳搜尋結果」、「預先訂義群集」作為輔助；透過代表性計算與併詞挑選出更符合使用者閱讀習慣的標題。分群結果評估實驗顯示，「最佳搜尋結果」、「預先訂義群集」對搜尋結果的空間密度比無顯著改變，但使用者滿意度調查的結果顯示受測者認為上述兩個機制能提供更好的分群結果。此外，在和著名的 Clusty 系統比較結果顯示，使用者比較滿意本研究所提出之系統的中文搜尋結果，在英文搜尋結果方面，使用者對於兩者的滿意度相當接近。

關鍵字

網頁分群；即時分群；匯總式搜尋；特徵詞挑選；中文斷詞

並列摘要

This study proposes a bilingual Web search results clustering system. The clustering task is performed on the fly by processing only titles, snippets and URLs of search results retrieved from popular search engines. Several improvements has been made during the feature selection stage including a refined Chinese word segmentation algorithm, a newly designed frequently used phrase extracting-referring mechanism, and an integrated feature selection process by lexical affinity and title weighting. Additional clustering enabled mechanisms such as URL similarity, best search results cluster (BRC) and pre-defined clusters (PDC) are used to assist clustering. Furthermore, cluster labeling was achieved by representativeness measuring and label terms rearrangement. The experimental results on space density ratio show that BRC and PDC cannot significantly improve clustering quality, but the results of user study reveal that users prefer the clustering results with BRC and PDC. We also compare our system with Clusty, the best known Web snippet clustering engine by Vivisimo.com. The user study results indicate that users are more satisfied with our system than with Clusty on Chinese search results, and are comparably satisfied with both our system and Clusty on English search results.

並列關鍵字

Web Clustering ； Ephemeral Clustering ； Meta Search ； Feature Selection ； Chinese Word Segmentation

參考文獻

[1] 平震宇，2007 『一個適用於行動裝置的網頁搜尋結果分群系統之研究』，元智大學資訊管理研究所碩士論文。

[2] Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981.

[3] Church, K.W. “Word association norms, mutual information, and lexicography,” Computational Linguistics (16:1), 1990, pp:22-29

[4] Dunn, J.C. “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,” Journal of Cybernetics (3), 1973, pp:32-57.

[5] Ferragina, P. and Gulli, A. “A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering,” Proceedings of the 14th International World Wide Web Conference, Chiba, Japan, 2005.

被引用紀錄

黃挺立（2013）。基於語意相關詞的搜尋結果分群〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2013.00253

許巧靜（2011）。類別相關詞對搜尋引擎的搜尋結果排名之影響〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2011.00190

林渝翔（2011）。一個產生長詞與新詞的中文混合斷詞系統〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2011.00155

楊盛帆（2009）。以整合式規則來做網路論壇上的 3C 產品口碑分析〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2009.00224

楊智捷（2009）。一個以語意引導的動態分群系統〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2009.00223

國際替代計量

一個中英文搜尋結果即時分群系統之研究

主題瀏覽