處理半監督式分群方法中的使用者觀點屬性與資料屬性的非線性關係

半監督式分群方法被用於找出可以與使用者提供的邊資訊相符合的分群結果。然而，由於抽樣偏差的影響，許多分群方法找出的結果往往與使用者真正所想的分群結果大相逕庭。傳統的資料點層級的邊資訊有可能位於一些會誤導演算法的資料點上，導致錯誤的分群結果。為了解決這個問題，有一篇相關論文提出了利用屬性層級的邊資訊: 觀點向量。然而，目前的方法假設資料屬性與使用者觀點屬性的關係是線性的。這個假設使得目前的方法無法在一些應用中捕捉到兩者的非線性關係。在本篇論文中，我們提出兩個非線性的方法: 非線性觀點嵌入 (NPE) 以及類神經網絡 (NN) 來捕捉資料屬性與使用者觀點屬性的非線性關係，並獲得更好的分群成效。

關鍵字

分群；半監督式分群；使用者觀點；非線性；非線性嵌入；取樣偏差；觀點向量

並列摘要

Semi-supervised clustering algorithms have been proposed to identify data clusters that align with some side information provided by users. However, the identiﬁed clusters are still far from the true clusters perceived by users, mainly due to the sampling bias—traditional instance-level side information may cover a few, non-randomly sampled instances that mislead the algorithms to wrong clusters. To overcome this problem, a related work proposes to learn from the feature-level side information: perception vectors. However, the existing method assumes a linear correlation between the data features and perception features, which can not capture the nonlinearity correlation in some applications. In this paper, we propose two approaches Nonlinear Perception Embedded (NPE) and Neural Network (NN) to capture the nonlinear correlation between data and perception features and give better performance.

並列關鍵字

clustering ； semi-supervised clustering ； user perception ； nonlinear ； nonlinear embedding ； sampling bias ； perception vector

參考文獻

[3] Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of KDD, pages 59–68, 2004.

[4] Sanjiv K Bhatia and Jitender S Deogun. Conceptual clustering in information retrieval. IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, 28(3):427–436, 1998.

[5] Mikhail Bilenko and Raymond J Mooney. Adaptive duplicate detection using learnable string similarity measures. In Proc. of KDD, pages 39–48, 2003.

[6] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. of CIVR, page 48, 2009.

[7] Guillaume Cleuziou. An extended version of the k-means method for overlapping clustering. In Proc. of ICPR, pages 1–4, 2008.

國際替代計量

處理半監督式分群方法中的使用者觀點屬性與資料屬性的非線性關係

全文下載

主題瀏覽