網頁搜尋結果的階層式動態分群之研究

本研究提出一個階層式分群方法將網頁搜尋結果做動態分群，以協助使用者以瀏覽分群樹方式，快速地找到有興趣的網頁。這個方法從網頁搜尋結果的網頁標題和說明片段中萃取出特徵詞彙，再依特徵詞彙的網頁涵蓋率和區別率的綜合指標來篩選分群概念、標題與群集個數。這個分群方法允許一個網頁分配到多個群集，同時，也把原來排名較前的網頁儘量排在前面的群集中。本研究以實作系統對熱門的中英文搜尋關鍵字在尋得時間(Reach Time)的初步效能表現來選定網頁分群的停止條件，再透過使用者滿意度測試，以及系統尋得時間對中英文關鍵字的表現，來做效能比較。實驗結果顯示，本研究提出的方法明顯優於商業化分群系統Vivisimo，而且略勝於有階層分群的相關方法DisCover。

關鍵字

文件分群；網頁搜尋；階層式分群；動態分群；多重分群

並列摘要

This study proposes a hierarchical clustering method for dynamic clustering of web search results. The resulting tree of clusters can help users efficiently locate the relevant web pages they are interested in. The proposed method extracts feature tokens from the page titles and snippets of search results, and based on an indicator calculated by the coverage and distinctiveness of these feature tokens, determines the clustering concepts, the cluster labels and the number of clusters. Additionally, the proposed method allows a web page to be grouped into several clusters, also it pushes the high ranking web pages into the leading clusters. This study determined the clustering termination condition based on preliminary evaluation results of reach time for several Chinese and English hot keywords. A user study showed that the users are more satisfied with the proposed system than with the commercial system, Vivisimo, and are slightly satisfied with the proposed system than with the related method, DisCover, using English and Chinese hot keywords. Moreover, a performance measure on reach time confirmed that the proposed system out-performs Vivisimo, and performs slightly better than DisCover.

並列關鍵字

Document Clustering ； Web Search ； Hierarchical Clustering ； Dynamic Clustering ； Overlap Clustering

參考文獻

[1] Ball, G. and D. A. Hall (1967). “A clustering technique for summarizing multivariate data,” Behavioral Science, 12:153–155, 1967.

[2] Chia-Hui Chang , Zhi-Kai Ding (2005), “Categorical data visualization and clustering using subjective factors,” Data & Knowledge Engineering, v.53 n.3, p.243-262, June 2005

[4] Fung, B. C. M. et al. (2003). “Hierarchical Document Clustering Using Frequent Itemsets,” Proceedings of the SIAM International Conference on Data Mining. pp. 59-70, 2003.

[5] Hoskinson, A. (2005). “Creating the Ultimate Research Assistance,” Computer, Volume 38, Number 11, pp. 97-99, 2005

[6] K. Kummamuru and R. Krishnapuram(2001). “A clustering algorithm for asymmetrically related data with its applications to text mining.” In Proceedings of CIKM, pages 571–573, Atlanta, USA.

被引用紀錄

陳靜蕙（2010）。精神分裂疾患主要照顧者自覺症狀處理之照護負荷相關因子探討— 以某精神專科醫院為例〔碩士論文，中山醫學大學〕。華藝線上圖書館。https://doi.org/10.6834/CSMU.2010.00051

余文雯（2012）。社區精神分裂症個案自我照顧能力與家屬負荷程度之研究〔碩士論文，中臺科技大學〕。華藝線上圖書館。https://doi.org/10.6822/CTUST.2012.00062

葉人豪（2007）。社區精神分裂症患者主要照顧者需要狀況之生活品質與負荷〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916273346

國際替代計量

網頁搜尋結果的階層式動態分群之研究

全文下載

主題瀏覽