本研究實做一個適用多種平台的中英文搜尋結果的即時分群系統。為了因應多種平台頻寬與處理能力的不同需求,本研究僅針對搜尋結果的標題、網頁片段和 URL 做分析。針對之前在行動裝置的研究缺失,我們提出下列改良:改良之前研究的中文斷詞方法,建立了一套中、英文常用詞萃取-參考機制;利用語意相關、標題加權等方法建立特徵詞聯合篩選機制;並在分群時加入超連結相似度計算與「最佳搜尋結果」、「預先訂義群集」作為輔助;透過代表性計算與併詞挑選出更符合使用者閱讀習慣的標題。分群結果評估實驗顯示,「最佳搜尋結果」、「預先訂義群集」對搜尋結果的空間密度比無顯著改變,但使用者滿意度調查的結果顯示受測者認為上述兩個機制能提供更好的分群結果。此外,在和著名的 Clusty 系統比較結果顯示,使用者比較滿意本研究所提出之系統的中文搜尋結果,在英文搜尋結果方面,使用者對於兩者的滿意度相當接近。
This study proposes a bilingual Web search results clustering system. The clustering task is performed on the fly by processing only titles, snippets and URLs of search results retrieved from popular search engines. Several improvements has been made during the feature selection stage including a refined Chinese word segmentation algorithm, a newly designed frequently used phrase extracting-referring mechanism, and an integrated feature selection process by lexical affinity and title weighting. Additional clustering enabled mechanisms such as URL similarity, best search results cluster (BRC) and pre-defined clusters (PDC) are used to assist clustering. Furthermore, cluster labeling was achieved by representativeness measuring and label terms rearrangement. The experimental results on space density ratio show that BRC and PDC cannot significantly improve clustering quality, but the results of user study reveal that users prefer the clustering results with BRC and PDC. We also compare our system with Clusty, the best known Web snippet clustering engine by Vivisimo.com. The user study results indicate that users are more satisfied with our system than with Clusty on Chinese search results, and are comparably satisfied with both our system and Clusty on English search results.