半監督式分群方法被用於找出可以與使用者提供的邊資訊相符合的分群結果。然而,由於抽樣偏差的影響,許多分群方法找出的結果往往與使用者真正所想的分群結果大相逕庭。傳統的資料點層級的邊資訊有可能位於一些會誤導演算法的資料點上,導致錯誤的分群結果。為了解決這個問題,有一篇相關論文提出了利用屬性層級的邊資訊: 觀點向量。然而,目前的方法假設資料屬性與使用者觀點屬性的關係是線性的。這個假設使得目前的方法無法在一些應用中捕捉到兩者的非線性關係。在本篇論文中,我們提出兩個非線性的方法: 非線性觀點嵌入 (NPE) 以及類神經網絡 (NN) 來捕捉資料屬性與使用者觀點屬性的非線性關係,並獲得更好的分群成效。
Semi-supervised clustering algorithms have been proposed to identify data clusters that align with some side information provided by users. However, the identified clusters are still far from the true clusters perceived by users, mainly due to the sampling bias—traditional instance-level side information may cover a few, non-randomly sampled instances that mislead the algorithms to wrong clusters. To overcome this problem, a related work proposes to learn from the feature-level side information: perception vectors. However, the existing method assumes a linear correlation between the data features and perception features, which can not capture the nonlinearity correlation in some applications. In this paper, we propose two approaches Nonlinear Perception Embedded (NPE) and Neural Network (NN) to capture the nonlinear correlation between data and perception features and give better performance.