透過您的圖書館登入
IP:3.133.147.252
  • 學位論文

一個以語意引導的動態分群系統

A Semantically Guided Dynamic Clustering System

指導教授 : 陸承志

摘要


本研究提出一個以語意引導的動態分群系統,這個系統採用匯總式搜尋 (Meta Search) 的方式,將使用者的查詢關鍵字送到 Google、Yahoo、MSN,再把搜尋引擎回傳的前三頁結果進行語意引導分群。 本研究的流程分為批次訓練和語意引導分群兩階段。批次訓練包含建立語意 RDF、建立領域相關詞和產生關聯分類規則三步驟;語意引導分群包含產生最佳、預設、待查與動態分群四個步驟。最佳群集包含一筆或多筆與查詢關鍵字最相關的網頁,篩選自搜尋回傳結果中排名前幾名的資料。預設與待查群集都是與查詢關鍵字有語意相關的群集,我們利用 Google Directory 回傳的結果,再加入人工微調,來尋找查詢關鍵字的相關主題,做為預設與待查群集。接著我們利用網頁內容的特徵詞來比對和觸發最適合的關聯分類法則,可將網頁分配到該法則指定的預設群集;若一個的群集標題與查詢關鍵字沒有語意相關,則此群集則為待查群集。 最後,本研究針對語意引導分群系統,進行分群效能評估與使用者滿意度調查。實驗結果顯示,第一層的語意引導分群的 Precision、Recall 與 F1-measure 分別為 0.96、0.90、0.92,在第二層的 3C 類別,平均的 Precision、Recall 與 F1-measure 分別為 0.99、0.98、0.99 ,顯示以此種方法來將網頁分預設群集的成效相當不錯。調查結果顯示,使用者對本系統的滿意度明顯高於商業分群系統的 Clusty 和純粹 K-Means 系統。

並列摘要


This study proposes a semantically guided dynamic clustering system based on meta search mechanism. The proposed system sends user’s queries to Google, Yahoo and MSN simultaneously and then analyzes the retrieved results through semantically guided clustering. The system flow is divided into two stages: batch training and online semantic guided clustering. The batch training includes processes for creating semantic RDF, creating field association terms, and generating association rules. The online semantic guided clustering includes processes for generating the best and predefined clusters and conducting K-Means clustering. The best cluster includes several websites most relevant to the query term and is placed at the top of the clustering result pages. The predefined clusters are the ones that are semantically relevant to the queries suggested by Google directory with minor human adjustment. Predefined clusters include web pages that are allocated by the association rules triggered by the feature words within these pages. The other predefined cluster titles that don’t have any web pages being allocated to them are called yet to be retrieval clusters. Finally, this study evaluates the effectiveness of the semantic guided clustering system. The experimental results indicate that the average precision, recall and F1-measure are 0.9, 0.90, and 0.92 respectively in the first layer. The average precision, recall and F1-measure are 0.99, 0.98, and 0.99 respectively in the second layer of 3C products. Also, a user study results indicate that our subjects are more satisfied with our system than pure a K-Means system and Clusty, a commercial system.

參考文獻


[1] 平震宇,2007 『一個適用於行動裝置的網頁搜尋結果分群系統之研究』,元智大學資訊管理研究所碩士論文。
[2] 陳智威,2008 『一個中英文搜尋結果即時分群系統之研究』,元智大學資訊管理研究所碩士論文。
[3] Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on VLDB, pp. 487-499.
[6] Chung W., Lai G., Bonillas A., Xi W., and Chen H. 2008. Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory. International Journal of Human-Computer Studies (66:2), pp. 51-66.
[8] Ferragina, P. and Gulli, A. 2005. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Proceedings of the 14th International World Wide Web Conference, pp. 801-810.

被引用紀錄


許巧靜(2011)。類別相關詞對搜尋引擎的搜尋結果排名之影響〔碩士論文,元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2011.00190
黃荷瑄(2008)。感官與走路功能對老人跌倒之因果關係性研究〔碩士論文,中山醫學大學〕。華藝線上圖書館。https://doi.org/10.6834/CSMU.2008.00063
黃郁婷(2011)。台灣老人跌倒狀況之危險因子分析:縱貫式世代研究〔碩士論文,長榮大學〕。華藝線上圖書館。https://doi.org/10.6833/CJCU.2011.00226
Chiang, C. L. (2012). 台灣65歲以上老人因跌倒而引發腦部受傷之趨勢研究 [master's thesis, Taipei Medical University]. Airiti Library. https://doi.org/10.6831/TMU.2012.00195
蕭伃伶(2011)。社區長者跌倒預防知識、信念及行為之研究—以健康信念模式為基礎〔博士論文,臺北醫學大學〕。華藝線上圖書館。https://doi.org/10.6831/TMU.2011.00093

延伸閱讀