從零開始生成可解釋人工智慧之異質訊息網路

圖神經網路(Graph neural networks; GNNs)在各個領域展現出卓越的性能，包括在電子商務中檢測垃圾使用者和評論，以及解決社群網路中的分類問題。然而，與計算機視覺(Computer vision)或自然語言處理(Natural language processing)等領域相比，公開圖資料集的稀缺性對於實現GNN模型的突破性研究創新構成了重要障礙。在異質資訊網路(Heterogeneous information networks; HIN)中，這個問題更加突出。隨著GNN模型解釋性日益受到關注，為HIN提供公平比較基準的資料集需求變得日益迫切。為了解決這個問題，我們的研究提出了從零開始生成可解釋人工智慧之異質訊息網路(Synthetic heterogeneous information networks; SynHIN)，一種從零開始生成人造異質資訊網路的新方法。使用真實世界資料集作為基礎，SynHIN識別圖數據集中的模體(motif)，並總結目標圖的統計資訊，從而創建一個人造的異質資訊網路。我們的方法使用群內合併(In-cluster merge)和群外合併(Out-clutser merge)模組，基於motif生成人造的HIN。首先，在群內合併階段，我們生成具有相同標籤的motif，並將它們合併成一個單獨的群(Cluster)。這個過程會多次重複，生成不同的群。隨後，我們進行群外合併，生成一個完整的異質資訊網路。合併後，我們採用修剪(Pruning)模組，以確保合成之人造圖與目標真實世界資料集相似，符合其統計特性。 SynHIN生成了一個適用於節點分類任務的人造異質資訊網路資料集，最初的motif用於解釋性的正確解答。SynHIN框架具有高度的適應性，可以根據不同的資料集和motif進行調整，以滿足用戶的需求。我們解決了異質資訊網路資料集的稀缺性問題，同時解決了異質資訊網路缺乏具有解釋性正確解答之資料集問題，成為評估異質圖神經網路解釋模型的工具。本研究提出了首個生成帶有motif作為解釋性正確解答的人造異質訊息網路的方法，目的在於評估HGNN解釋性模型之效能。此外，我們提供了一個可用於未來研究的異質圖解釋性模型的基準資料集。我們的研究為HGNN領域的可解釋性人工智慧建立了一個新的評估基準，為該領域未來的發展奠定了堅實的基礎。

關鍵字

圖神經網路；人造圖；圖學習基準；可解釋性人工智慧；異質資訊網路；異質網路模體

並列摘要

Graph Neural Networks (GNNs) have demonstrated exceptional performance in various domains, including detecting spam users and reviews in e-commerce and tackling classification problems in social networks. However, compared to fields such as computer vision or natural language processing, the scarcity of public graph datasets presents a significant hurdle for realizing breakthrough research innovations in GNN models. This challenge is even more pronounced in the case of heterogeneous information networks (HINs). As the interpretation of GNN models has gained recent attention, the need for datasets that provide a fair comparison baseline for HINs has become increasingly urgent. To address this need, our research proposes SynHIN, a novel approach for generating synthetic heterogeneous information networks from scratch. Leveraging real-world datasets as references, SynHIN identifies motifs within the graph dataset and summarizes the target graph statistics to create a synthetic heterogeneous information network. Our approach utilizes in-cluster and out-cluster merge modules to construct the synthetic HIN based on motif clusters. Initially, we generate motifs within the same label and merge them into a single cluster in the in-cluster merge phase. This process is repeated multiple times to generate various clusters. Subsequently, we perform an out-cluster merge to create a comprehensive heterogeneous graph. After merging, we apply pruning techniques to ensure that the synthetic graph closely aligns with the target real-world dataset, adhering to its statistical properties. SynHIN generates a synthetic heterogeneous graph dataset suitable for node classification tasks, with the initial motifs serving as ground truth explanations. The SynHIN framework is highly adaptable and can be adjusted to different datasets and motifs to meet user requirements. It addresses the scarcity of heterogeneous graph datasets. It also solves the problem of lacking motif ground truth in heterogeneous graphs, making it a valuable tool for evaluating interpreters of heterogeneous graph neural networks. This research introduces the first-ever methodology for generating synthetic heterogeneous information networks with motif ground truths tailored for evaluating HGNN interpreter models. Additionally, we provide a benchmark dataset for future research on heterogeneous graph explainer models. Our work establishes a new standard for explainable AI in the field of HGNNs, laying a solid foundation for further advancements.

並列關鍵字

graph neural networks ； synthetic graphs ； graph learning benchmarks ； explainable artificial intelligence ； heterogeneous information networks ； heterogeneous network motifs

參考文獻

[1] J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann et al., “A deep learning approach to antibiotic discovery,” Cell, vol. 180, no. 4, pp. 688–702, 2020.

Google Scholar

[2] Z. Cui, X. Xu, X. Fei, X. Cai, Y. Cao, W. Zhang, and J. Chen, “Personalized recommendation system based on collaborative filtering for iot scenarios,” IEEE Transactions on Services Computing, vol. 13, no. 4, pp. 685–695, 2020.

Google Scholar

[3] O. Shchur and S. Günnemann, “Overlapping community detection with graph neural networks,” arXiv preprint arXiv:1909.12201, 2019.

Google Scholar

[4] I. Chami, S. Abu-El-Haija, B. Perozzi, C. Ré, and K. Murphy, “Machine learning on graphs: A model and comprehensive taxonomy,” Journal of Machine Learning Research, vol. 23, no. 89, pp. 1–64, 2022.

Google Scholar

[5] J. Palowitch, A. Tsitsulin, B. Mayer, and B. Perozzi, “Graphworld: Fake graphs bring real insights for gnns,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 3691–3701.

Google Scholar

國際替代計量

從零開始生成可解釋人工智慧之異質訊息網路

查找全文

主題瀏覽