透過您的圖書館登入
IP:18.223.119.17
  • 學位論文

基於多樣性排序及罩蓋分群概念建立標籤階層

Building Tag Hierarchy Based on Diversity Ordering and Canopy Clustering Concepts

指導教授 : 吳宜鴻

摘要


標籤資訊的兩個特性常造成不易分群,一種是一字多義,另一種是可能存在同義字。同一個標籤因為一字多義而屬於不同的意涵群聚,而同義字造成不同標籤存在明顯的語意相關性,因此我們運用罩蓋概念讓標籤能重複呈現在不同分群中,藉由標籤階層讓同義字得以按照語意範圍分層陳列。建立標籤階層時,加入標籤的順序直接影響標籤在階層上的呈現,因而我們提出多樣性廣度,一個標籤擁有越高的多樣性廣度,代表此標籤與越多不同類型的標籤相似,可能擁有較多的語意,或從標籤網路的結構來看,可能連結到較大群的標籤集合。本論文以標籤相近關係為基礎,利用兩兩標籤互將對方視為前k近標籤的高相似性,分出較小且精準的微群,透過微群與微群內含標籤之間的關係,進行微群合併至預設的分群數。我們設計階層式模擬資料產生器,控制群聚重疊機率與避免節點過深設定提早結束機率產生階層,比對原始分群數的階層架構與方法產生限制分群數的標籤階層,衡量各階層分群的一致性加以驗證,經過不同模擬資料集的測試後,本論文的方法分群效果幾乎在任何條件下比其他方法好,在群重疊度越低時,呈現最佳分群效果。

關鍵字

多樣性 標籤階層 罩蓋 標籤

並列摘要


Two characteristics among tags, i.e., polysemy and synonymy, often affect clustering. A tag may belong to different semantic clusters due to polysemy, while different tags with semantic relevance may result from synonymy. Therefore, we adopt the concept of canopy so that tags show up in different clusters and a tag hierarchy enables the synonyms to be organized in a level-wise manner according to the semantic coverage. During the construction, the order of adding tags directly influences the outcome. Hence we propose a measure called diversity broadness. The higher measurement a tag achieves, the more types of tags or meanings it can be similar to. From the perspective of a tag network, the tag with a high measurement may connect to a set of much larger clusters. Based on the tag similarity and kNN, this thesis first divides the tags into small but precise clusters. Further, these small clusters are merged step by step according to the overlapping degree among them until the predefined number of clusters is obtained. We design a data generator to create a tag hierarchy and control the overlapping degrees among clusters. After that, we compare the tag hierarchy built by our method with the one initially produced by the data generator. With a series of experiments on different datasets, our method outperforms the others under any condition. As the overlapping degree is very low, the clustering result of our method is often the best.

並列關鍵字

canopy tag tag hierarchy diversity

參考文獻


[1] 彭建欽, “社交網路上以密度為基礎之標籤階層,” 中原大學資訊工程研究所碩士論文,2012.
[3] 林建宇, “實現多維度檢索暨標籤推薦之書籤分享系統,” 中原大學資訊工程研究所碩士論文,2012.
[4] L. Liu, F. Zhu, M. Jiang, J. Han, L. Sun, and S. Yang, “Mining diversity on social media networks,” Multimedia Tools and Applications, 56(1), 179-205, 2012.
[5] X. Zhu, Z. Ming, X. Zhu, and T. Chua, “Topic hierarchy construction for the organization of multi-source user generated contents,” Annual ACM SIGIR Conference, pp. 233-242,2013.
[6] A. McCallum, K. Nigam, and L.H. Ungar, “Efficient clustering of high-dimensional data sets with application to reference matching,” ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 169-178,2000.

延伸閱讀