透過您的圖書館登入
IP:18.220.66.151
  • 學位論文

大規模深度圖卷積網路之快速訓練演算法

Efficient Algorithms for Training Deep and Large Graph Convolutional Networks

指導教授 : 林智仁
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


圖卷積網路 (graph convolutional networks) 是一項近年來被成功地應用在許多基於圖結構的問題上的技術,然而,要有效率地訓練大規模的圖卷積網路仍然非常具有挑戰性。 在本篇論文中,我們仔細地討論了圖卷積網路的背景知識,透過點分類問題的例子介紹模型的概念,並給出其最佳化問題的數學表達式,以進行複雜度分析。 在回顧完背景後,我們仔細地比較了現有的圖卷積網路訓練算法,以及其潛在的問題,大體來說,現今有許多研究者提出基於隨機梯度法的方法來解決訓練緩慢的問題,但他們仍有很高的計算成本,並且當圖卷積網路的深度加大後,其運算成本會以指數型地增長,另一方面,有些方法也在記憶體資源上的要求很高,甚至需要儲存整張圖上每個節點的嵌入向量 (embedding vector) 作為基礎,這些方法在遇到大規模的圖資料時可能會遭遇到運算資源上的瓶頸或甚至不可行。 在本篇論文中,我們提出了一個快速訓練圖卷積網路的方法—聚類圖卷積網路法 (Cluster-GCN),這個方法仍基於隨機梯度法,但充分利用了圖結構的特性來加速訓練,聚類圖卷積網路法的具體步驟如下:在預處理階段中,我們首先使用了圖聚類演算法 (graph clustering algorithm) 來將整張圖切塊成多個子圖 (subgraph),之後在每一輪訓練時,我們隨機抽樣一個子圖的節點們,並將圖卷積網路過程中的鄰居搜索 (neighborhood search) 限制在該子圖範圍內,最後基於隨機梯度法來做模型的更新。 這個簡單且有效的方法可以將記憶體資源以及計算成本大大地改善,並且也能達到跟先前方法相仿的模型正確率,在本篇論文的實驗中,我們在多個面向如:記憶體需求、訓練時間、以及每輪的收斂速度上來檢驗不同的訓練方法,實驗結果顯示我們的算法相較於其他論文能達到幾乎最佳的表現,為了更進一步地測試該算法之擴展性 (scalability),我們還創造了一個新的圖數據集 Amazon2M,其原始資料來自於 Amazon 購物網站上的商品分類資訊,我們利用消費者是否經常同時購買此兩產品的資訊,以圖的方式表達產品與產品之間的聯繫,具體來說形成了一個共同購買網路,該數據有200萬個節點以及6100萬條邊。 實驗結果顯示我們提出的聚類圖卷積網路法在Amazon2M上的表現卓越,相比於先前最佳的訓練演算法我們可以達到較快的訓練速度並且使用了更少的記憶體資源,更進一步地,我們也在這篇論文中分析如何有效地訓練深度圖卷積網路,我們提出的聚類圖卷積網路法能避免掉高額的運算,並且其訓練時間以及資源成本並無增加太多,此項進展也帶來了在眾多公開數據集的突破,舉例來說:在PPI這個數據上,我們的算法成功訓練了一個5層的圖卷積網路並達到了99.36的Micro-F1正確率,相比於先前最佳的結果 98.71 還要高,顯示出我們提出的聚類圖卷積網路法能有效地訓練深度網路,而其簡單有效的特性可以作為基礎來訓練更複雜多樣的圖卷積網路法,我們也開放本算法的原始碼在 https://github.com/google-research/google-research/tree/master/cluster_gcn 自由供公眾使用。

並列摘要


Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. In this thesis, we detailedly discuss technical background of graph convolutional networks. We begin with introducing a node classification example to motivate the problem and ideas of GCN models. Then we give mathematical notations to describe the optimization problem of GCN. After reviewing the background, we analyze existing methods for solving large-scale GCN and discuss some possible issues in previous methods. Roughly speaking, current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. To resolve those issues, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To demonstrate the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges. When training a 3-layer GCN on Amazon2M, Cluster-GCN is faster than previous state-of-the-art methods with much less memory usage. Cluster-GCN also allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy—using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71. Our codes are publicly available at https://github.com/google-research/google-research/tree/master/cluster_gcn.

參考文獻


[1] J. Chen, T. Ma, and C. Xiao. FastGCN: Fast learning with graph convolutional networks via importance sampling. In ICLR, 2018.
[2] J. Chen, J. Zhu, and S. Le. Stochastic training of graph convolutional networks with variance reduction. In ICML, 2018.
[3] W.-L. Chiang, X. Liu, S. Si, Y. Li, S. Bengio, and C.-J. Hsieh. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks. In KDD, 2019.
[4] H. Dai, Z. Kozareva, B. Dai, A. Smola, and L. Song. Learning steady-states of iterative algorithms over graphs. In ICML, pages 1114–1122, 2018.
[5] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.

延伸閱讀