Conning with Delta, DeltaCon, for Graph Similarity
Graph-based representation, also known as network model, is a natural choice in many areas such as biology, computer science, communication, social sciences, and telecommunication etc. With graph neural networks on ascend, graph methods are increasingly getting more attention. Often, we are interested in quantifying the difference between a pair of graphs to gauge their similarity. Measuring the distance between two graphs is different from the graph isomorphism problem where we look for a mapping based on one-to-one correspondence between the nodes of the two graphs. Several methods for measuring distance between graphs have been suggested and used; in this post I will describe briefly the DeltaCon measure. This measure works under the assumption that the node correspondence between the two graphs is known. While there are several measures for computing graph distance between a pair of graphs, they all yield highly correlated results.
DeltaCon Measure of Graph Similarity
The basic idea behind this method is to calculate an affinity measure between a pair of nodes in the first graph, repeat the similar affinity calculations for the corresponding node pair in the second graph, and then compare the two measures to get an idea about the difference in the node pairs over two graphs. By repeating the node affinity calculations over all node pairs for the respective graphs, one can get an aggregate measure for similarity between the two graphs.
The pairwise node affinity s(i,j) between node i and node j is measured by looking at all possible paths between the two nodes. The DeltaCon paper shows that the similarity matrix 𝑆 of a graph 𝐺 measuring the similarity between different node pairs can be obtained by the following expression:
where 𝐼 is an identity matrix, 𝐷 is the degree matrix, and 𝐴 is the adjacent matrix of the graph 𝐺. The quantity 𝜖 is a small positive constant. Given two graphs 𝐺1 and 𝐺2, the DeltaCon distance measure is computed using the following relationship:
An important property of the DeltaCon measure is that it is a true metric satisfying the triangular inequality of a true distance.
Illustration of DeltaCon Measure
We will use the netrd library which provides a NetworkX-based interface to compute various graph distances along with many other graph utilities. We will use the Air Transportation Multiplex Dataset for experimentation. The dataset is composed of a multiplex network of the airlines operating in Europe. It contains thirty-seven different layers of networks each one corresponding to a different airline. We will use only two subsets of the dataset consisting of the three biggest major airlines (Air France, British Airways, and Lufthansa), and the three low-cost airlines (Air Berlin, Easy Jet, and Ryanair). A graph of the Ryanair is shown below; it consists of 149 nodes with 729 edges. This gives us the idea of the airlines networks in our example.
All the nodes in the dataset have an associated airport code and node indices are consistent over the entire multiplex. Thus, the node correspondence exists in the dataset. However, each airline flies to only a subset of the total nodes in the multiplex. Since DeltaCon requires node correspondence as well the same number of nodes in the two graphs to compare, we need to add nodes to the graphs under comparison to bring them to same size. The following code snippet shows how the DeltaCon distance is being computed for a pair of airlines; similar computation applies to other airline pairs.
import numpy as np
import networkx as nx
import netrd
f = nx.read_adjlist(France.txt)
Fr = nx.Graph(f)
b = nx.read_adjlist(Brit.txt)
Br = nx.Graph(b)
list1 = list(Fr.nodes)
list2 = list(Br.nodes)
new_list1=set(list1)-set(list2)#Elements of list 1 that are not present in list 2
new_list2=set(list2)-set(list1)#Elements of list 2 that are not present in list 1
Fr.add_nodes_from(new_list2)
Br.add_nodes_from(new_list1)
# Now both graphs have identical size and nodes are in correspondence
dist = netrd.distance.DeltaCon()
print(dist(Fr,Br))
1.5233983011182537
The distances between the networks of different airlines calculated using the DeltaCon measure are tabulated below. I also applied the single linkage clustering to the resulting distances. A dendrogram resulting from the single linkage clustering puts the three major airlines in one cluster and the three low-cost airlines in another cluster. This suggests that the DeltaCon distance is able to capture the underlying properties of a graph and can be used for graph discrimination.
While using DeltaCon, we must remember that it requires node correspondence between the pair of graphs whose similarity is being measured. When such a correspondence is not known, we can use other measures such as the spectral graph distance and the portrait divergence method. You can read about these methods at this blog post.