Abstract

Based on the community structure characteristics, theory, and methods of frequent subgraph mining, network motifs findings are firstly introduced into social network analysis; the tendentiousness evaluation function and the importance evaluation function are proposed for effectiveness assessment. Compared with the traditional way based on nodes centrality degree, the new approach can be used to analyze the properties of social network more fully and judge the roles of the nodes effectively. In application analysis, our approach is shown to be effective.

1. Introduction

A large number of systems in the real world exist as networks, such as social networks (coauthor network, criminal networks, etc.), biological networks (protein interaction networks, metabolic networks, etc.), and technology networks (electricity networks, the Internet, etc.) [112]. In order to reveal their structure and principle, Milo et al. first proposed the concept of “network motifs,” which can be defined as patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized network [13]. Later, research on network motifs has been developed extensively. Kim et al. defined biological network motifs as biologically significant subgraphs [14]. Farina et al. identified regulatory network motifs from gene expression data, and they proposed the corresponding algorithm [15]. In order to specify network motifs, Ohnishi et al. analyzed an interfirm network consisting of about one million firms and four million directed links [16].

The study of social networks has always been a hot research topic. In order to judge the importance of nodes, the staple methods of traditional social network analysis are basing on the calculation of the centrality of nodes in network, [17, 18]. In recent years, various new methods are introduced into social network analysis; network motif is an important kind of them [19, 20]. Analyzing motifs for the large social networks derived from email communication firstly, Juszczyszyn found that the distribution of motifs in all analyzed real social networks is similar and can be treated as the network fingerprint. This property is most distinctive for stronger human relationships [21, 22].

In this paper, we introduce network motifs to develop a set of network analysis methods, which is different from the traditional social network analysis, and also illustrate its application.

2. Research Methods

2.1. Directed Graph and Point Centrality

A network with nodes is denoted by , where is node set,   is edge set. is  adjacency  matrix of , of which the elements are as follows: where .

The centrality analysis is the staple method of traditional social network analysis [17, 18]. In a network, if there are direct links between an actor and other actors, this actor resides in the centre of the network, having more “power” [17]. The importance of a node, point centrality, can be measured by the number of contacted nodes [18]. Based on the adjacency matrix, the formula of point centrality of node is as follows: where .

2.2. Network Motifs Finding
2.2.1. Frequent Graph

Frequent subgraph mining is an important method of network information mining. Frequent subgraph mining algorithms are divided into breadth-first search (BFS) algorithm and depth-first search (DFS) algorithm based on subgraph search path [23]. As a breadth-first search algorithm, Apriori graph mining (AGM) algorithm is an early adopter of Apriori idea. AGM algorithm takes an adjacency matrix to represent the graph. Then it generates code based on adjacency matrix and takes minimum coding as unique identification for the graph in order to solve NP problem of subgraph isomorphism [24].

The graph which is constituted by node set and edge set is a subgraph of . Let . Furthermore, the node set of subgraph is denoted as .

Based on the adjacency matrix of a subgraph, the maximum encoding is obtained as unique identification of the subgraph. AGM algorithm is used to mine frequent subgraphs based on the maximum encoding.

2.2.2. Random Network Model

In typical network motifs finding algorithms, random network model maintains the degree sequence of the real network very well [25]. Exchange algorithm is an algorithm for generating random network according to degree sequence, which is as follows [26].

Algorithm A.Input: degree sequenceOutput: random networkStep 1: Construct a network according to degree sequence.Step 2: Randomly select a pair of edges (e.g.,,).Step 3: Carry out the Monte Carlo exchange (,).Step 4: Cancel the exchange if the exchange has led to multiple edges or loops.Step 5: Repeat until reaching the target number of times.In this way, a set of random networks with the same degree sequence as can be obtained.

2.2.3. Statistical Significance of Network Motifs

Network motifs are frequent subgraphs with special statistical significance, which have some special functions in the network. Network motifs satisfy the following conditions: occurrence of the subgraph in real network is not less than a minimum and is significantly higher than their occurrence in random network [13, 27].

The statistical significance of network motifs is denoted by : where denotes the occurrences of a subgraph in real network and and denote mean and standard deviation of the occurrences of the subgraph in random networks.

2.3. Frequency Matrix

The nodes of a network always have a lot of roles, such as teacher and student. Most social networks can be simulated by role network model (RNM). The role set is denoted by , where is the number of roles. The set of nodes whose role is ) is denoted by , so the set of is denoted by . denotes the set of subgraphs whose structure is the same as the network motifs in the network.

In order to determine the role tendentiousness of the unknown role nodes, we can count the frequency of the unknown role nodes occurring in different network motifs, respectively, through the composition of nodes in network motifs, which contain different known role nodes.

Based on network motifs, frequency matrix   is obtained. The elements of denotes the total occurrences of node in the network motifs that contain the known role nodes, whose role is .

The algorithm for calculating frequency matrix is as follows.

Algorithm B.Input: , , Output: Step 1: For any , for any .Step 2: For any , for any , if , then for any ,.Step 3: Output .Step 4: End.

2.4. Evaluation Function
2.4.1. Tendentiousness Evaluation Function (TEF)

Based on frequency matrix , the tendentiousness of node with respect to role is evaluated by TEF.

Definition 1. The TEF value is defined by where ,  , and  . Obviously, the greater , the greater the tendentiousness of node with respect to role .

2.4.2. Importance Evaluation Function (IEF)

Based on point centrality and TEF, the importance of node with respect to role in the network is evaluated by IEF.

Definition 2. The IEF value is defined by The normalized form is where ,  ,  and . The greater , obviously, the greater the importance of node with respect to role in the network.

3. Application Analysis

The Intergalactic Crime Modelers (ICM) is investigating a conspiracy to commit a criminal act. The case involves 83 members and 400 messages between these people, as shown in Figure 1. As priorly known in [28], Jean, Alex, Elsie, Paul, Ulf, Yao, Harvey are conspirators, Darlene, Tran, Jia, Ellin, Gard, Chris, Paige, Este are nonconspirators.

Now, we analyze the set of prior conspirator and the set of prior non-conspirator by using the theory and methods of network motifs.

Firstly, let be “conspirator” and let be “non-conspirator”. Then the links are divided into two categories, of which daily topic is denoted by topic 1, and conspiracy topic is denoted by topic 2.

Based on adjacency matrix of the network, point centrality of nodes is calculated by using formulas (2) as shown in Table 1. The point centrality reflects the influence of a node in the network, which means the larger point centrality a criminal node have, the bigger negative impact on the network will occur. Therefore, ICM should focus on high-ranking members in Table 1.

According to the method in Section 2.2, frequent subgraphs and network motifs of the network are obtained, as shown in Table 2 (example).

Depending on the network motifs, frequency matrix is obtained by applying Algorithm B. Based on the evaluation functions TEF and IEF, the priority list of criminal tend and the priority list for key monitoring are obtained, as shown respectively, in Tables 3 and 4.

By comparing and analyzing, the “network motifs” offer a much more comprehensive way to analyze social networks; the high-ranking members having both higher point centrality and more criminal tend in Tables 3 and 4 are more suspect than others, so ICM should monitor them. Our research confirmed that the method is suitable in social network and the results are reliable.

4. Conclusions

In this paper, we developed a set of network analysis methods based on the theory and methods of social network analysis, frequent subgraph mining, network motifs, and so forth, which is different from the traditional social network analysis. In application analysis, a series of priority lists are obtained based on the evaluation functions. The priority lists reflect network information effectively, which is of great reference value for ICM.

Based on the study on node connection relationship of social networks, a follow-up study will involve more attention to structural relationship with more practical value.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The research is supported in part by the National Natural Science Foundation of China under Grant 11271163 and the Fundamental Research Funds for the Central Universities under Grant JUSRP51317B (China).