未標記資料之連結發現

許多社群，學術，生物，地理及資訊系統可以用網路來做描述。連結發現是一種在社群網路中確認隱藏連結的研究。然而，某些情況下，針對我們想發現的連結，並無法取得已標記的資料。在此論文中，我們研究一個關於連結發現問題的新面向：發現未標記之連結。我們進一步研究兩個子題，來預測兩種未標記之連結：在異質性網路中未標記之關係連結，以及在同質性網路中未標記之傳播連結。此問題之主要挑戰為缺少標記資料，所以無法直接使用傳統的自動分類方法。為解決此問題，我們設計了以機器學習為基礎的架構，來整合各種不同的資訊，並發現未標記資料的連結。我們也在許多真實世界的資料集上進行實驗，以驗證我們所提出的方法。實驗結果除了顯示我們所提出的方法可以解決此問題，也指出未標記資料之連結發現可以應用在許多不同的實務情境之中。

關鍵字

連結發現；連結預測；資料探勘；機器學習；社群網路；機率圖形學習模型；自然語言處理

並列摘要

Many social, academic, biological, geographical, and information systems can be described by networks. Link discovery is a kind of task aiming at identifying hidden links in a social network. However, in some cases, the labels of the links to be discovered is not available. In this dissertation, we investigate such a novel aspect of the link discovery task: the problem of discovering unlabeled links. Specifically, we conduct two studies to predict two kinds of unlabeled links respectively: links that represents unlabeled relationship in heterogeneous networks, and links that represents unlabeled diffusion in homogeneous networks. The main challenge of these tasks are the lack of labeled data, thus prevents the direct exploiting of traditional classification approaches. To address this challenge, we design learning-based frameworks to integrate diverse information and solve the corresponding link discovery problems in the two studies. Also, we conduct experiments on various real-world datasets to evaluate our proposed frameworks. The promising experiment results not only demonstrates the usefulness of the proposed models, but also indicates that discovering links without labeled data is feasible in many practical scenarios.

並列關鍵字

Link discovery ； Link prediction ； Data mining ； Machine learning ； Social network ； Probabilistic graphical model ； Natural language processing

參考文獻

[1] ADAMIC, L.A. and ADAR, E., 2003. Friends and Neighbors on the Web. Social Networks 25, 3, 211--230.

[3] BARABASI, A.L. and ALBERT, R., 1999. Emergence of Scaling in Random Networks. Science 286, 5439, 509.

[4] BILGIC, M., NAMATA, G.M., and GETOOR, L., 2007. Combining Collective Classification and Link Prediction. In Proceedings of the 7th IEEE International Conference on Data Mining Workshops (ICDMW) (2007), 1336107, 381-386. DOI= http://dx.doi.org/10.1109/icdmw.2007.28.

[6] BRIN, S. and PAGE, L., 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30, 1--7, 107-117.

[12] FAN, R.-E., CHANG, K.-W., HSIEH, C.-J., WANG, X.-R., and LIN, C.-J., 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research (JMLR) 9, 1871-1874.

國際替代計量

未標記資料之連結發現

主題瀏覽