圖對比學習的自適應資料擴增架構

作為著名的自監督式學習方法，圖對比學習是現今一大熱門的主題。常見的對比學習方法需要對輸入資料進行資料擴增，然而，在圖上進行資料擴增並不直觀，不適當的方法有可能破壞圖的結構，進而導致模型訓練結果不佳。因此，如何在不破壞結構的情況下對圖進行資料擴增，又或者不使用資料擴增去做圖對比學習是目前在這個領域的一大難題。這篇論文提出了一種全新的架構，該架構並不限制任何的資料擴增方法，可以自己適應並排除掉被破壞結構的資料，並且生成出一個包含原本資料集和資料擴增後沒有被破壞結構的資料的新資料集。準確來說，我們將一個批次的原本的資料和進行資料擴增後的資料輸入訓練好的模型，並計算這兩個批次間輸出的表徵的L2範數，蒐集L2範數較小的圖形成新的資料集。這是建立在同類別的資料若是沒有遭到資料擴增破壞結構的話，輸入受過訓練的模型，其表徵在潛在空間上距離比較近的觀察下所發想出來的。我們將新的資料集再去訓練一個全新的模型，也顯示這個用新資料集訓練的模型不僅比使用原有資料集訓練表現得更好，更能夠得到和最先進模型相比接近或更好的準確度。

關鍵字

機器學習；自監督式學習；圖神經網路；圖對比學習；資料擴增

並列摘要

Graph contrastive learning (GCL) has emerged as a famous self-supervised learning method. Its efficacy often hinges on the generation of positive samples through data augmentation. Unfortunately, applying data augmentation to graph is not intuitive. Inappropriate augmentation methods may destroy graph structure, leading to poor model performance. Thus, developing a data augmentation method that preserve semantics of the graph, or alternatively, a GCL methods without data augmentation becomes a significant challenge within this domain. In this paper, we propose a novel framework that is compatible with all data augmentation methods while being self-adaptive. It excludes data which graph structure are destroyed, creating a new dataset including data from original dataset and those preserved its semantics after data augmentation. Specifically, we input a batch of original data and augmented data into a trained model. The L2 norm of the representations between two batches are computed, and we extract those graphs with minimal L2 norm. This is inspired by the fact that for a trained model, representations from two graphs with same label should exhibit proximity. We train a new model on refined dataset. The results show that this model not only outperforms the model trained on the original dataset but also achieves competitive or better performance in comparison to state-of-the-art methods.

並列關鍵字

Machine learning ； self-supervised learning ； graph neural networks ； graph contrastive learning ； data augmentation

參考文獻

B. Adhikari, Y. Zhang, N. Ramakrishnan, and B. A. Prakash. Sub2vec: Feature learning for subgraphs. In Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part II 22, pages 170–182. Springer, 2018.

Google Scholar

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.

Google Scholar

Y. Chen, Q. Ren, and L. Yong. Hybrid augmented automated graph contrastive learning. arXiv preprint arXiv:2303.15182, 2023.

Google Scholar

C.-Y. Chuang, J. Robinson, Y.-C. Lin, A. Torralba, and S. Jegelka. Debiased contrastive learning. Advances in neural information processing systems, 33:8765–8775, 2020.

Google Scholar

H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song. Adversarial attack on graph structured data. In International conference on machine learning, pages 1115–1124. PMLR, 2018.

Google Scholar

主題瀏覽