本研究提出一個以語意引導的動態分群系統,這個系統採用匯總式搜尋 (Meta Search) 的方式,將使用者的查詢關鍵字送到 Google、Yahoo、MSN,再把搜尋引擎回傳的前三頁結果進行語意引導分群。 本研究的流程分為批次訓練和語意引導分群兩階段。批次訓練包含建立語意 RDF、建立領域相關詞和產生關聯分類規則三步驟;語意引導分群包含產生最佳、預設、待查與動態分群四個步驟。最佳群集包含一筆或多筆與查詢關鍵字最相關的網頁,篩選自搜尋回傳結果中排名前幾名的資料。預設與待查群集都是與查詢關鍵字有語意相關的群集,我們利用 Google Directory 回傳的結果,再加入人工微調,來尋找查詢關鍵字的相關主題,做為預設與待查群集。接著我們利用網頁內容的特徵詞來比對和觸發最適合的關聯分類法則,可將網頁分配到該法則指定的預設群集;若一個的群集標題與查詢關鍵字沒有語意相關,則此群集則為待查群集。 最後,本研究針對語意引導分群系統,進行分群效能評估與使用者滿意度調查。實驗結果顯示,第一層的語意引導分群的 Precision、Recall 與 F1-measure 分別為 0.96、0.90、0.92,在第二層的 3C 類別,平均的 Precision、Recall 與 F1-measure 分別為 0.99、0.98、0.99 ,顯示以此種方法來將網頁分預設群集的成效相當不錯。調查結果顯示,使用者對本系統的滿意度明顯高於商業分群系統的 Clusty 和純粹 K-Means 系統。
This study proposes a semantically guided dynamic clustering system based on meta search mechanism. The proposed system sends user’s queries to Google, Yahoo and MSN simultaneously and then analyzes the retrieved results through semantically guided clustering. The system flow is divided into two stages: batch training and online semantic guided clustering. The batch training includes processes for creating semantic RDF, creating field association terms, and generating association rules. The online semantic guided clustering includes processes for generating the best and predefined clusters and conducting K-Means clustering. The best cluster includes several websites most relevant to the query term and is placed at the top of the clustering result pages. The predefined clusters are the ones that are semantically relevant to the queries suggested by Google directory with minor human adjustment. Predefined clusters include web pages that are allocated by the association rules triggered by the feature words within these pages. The other predefined cluster titles that don’t have any web pages being allocated to them are called yet to be retrieval clusters. Finally, this study evaluates the effectiveness of the semantic guided clustering system. The experimental results indicate that the average precision, recall and F1-measure are 0.9, 0.90, and 0.92 respectively in the first layer. The average precision, recall and F1-measure are 0.99, 0.98, and 0.99 respectively in the second layer of 3C products. Also, a user study results indicate that our subjects are more satisfied with our system than pure a K-Means system and Clusty, a commercial system.