Variants of Graph Neural Networks (GNN)

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Graphs are data structures consisting of nodes and the relationships between nodes, known as edges. Recently there have been many model architectures leveraging the power of graphs and their characteristics. The fundamental models were called Graph Neural Networks. Once the potential was discovered of these graph-based models, the artificial intelligence community has continuously developed the variants of graph-based models depending on various use cases.

Just as a refresher, let us look at the fundamental driving blocks of GNN. A simple GNN works based on input, i.e. node values, and the way the network propagates. There is one more parameter that makes a particular model unique: the training methodology. In a GNN, the inputs are taken based on the propagation step, which in standard architecture is called message passing. Here it aggregates every node value and passes it through an activation function. The network’s training methodology is getting the final structure/representation and goal value correct.

Now that we know what traditional architecture is, let us see how we can classify and understand different types of GNNs. The types are decided based on three categories.

Based on the Graph type
Based on the propagation step
Based on the training method

First based on graph types. As we know that if graphs are of many types and as the fundamental building block changes, the algorithm will change.
The types based on the graph are:

Directed Graph – DGP
Heterogeneous graph – Graph inception, HAN
Edge-informative graph – G2S, R-GCN
Dynamic graph – DCRNN, Structural-RNN

Although GNN is a powerful architecture for modeling structural data, there are several limitations, which can be removed by some tweaks. The first tweak is w.r.t the type of graph used. As mentioned above the first is the directed graph.

An undirected edge suggests that there is a relationship between the two nodes, but a directed edge could provide more information. For example, if we have something like a class hierarchy, then we can easily represent that type of data using a directed graph, like the head being child and tail as the parent., or vice-versa. For this, we created a DGP algorithm that uses two weight matrices. Wp and W¬c, the weights for parents and children, respectively.

Then there are heterogeneous graphs. This kind of graph consists of various types of nodes. The computations are done by getting every node representation similar by transforming the values by one-hot encoding. Due to the ability of different nodes, the algorithm Graph Inception was developed, which used this characteristic by grouping different neighbors and clustering them to be used as a whole. These clusters are also called sub-graphs which can be used to have parallel computations. And keeping the same heterogeneous property in mind, the Heterogeneous Graph Attention Networks were created.

Next are the dynamic graphs. These kinds have static graph structure and dynamic inputs. This allows for the adaptive structures or algorithms which require dynamicity in the internal structures. When the idea of Structural-RCNN came, it seemed difficult because of two inputs, spatial as well as temporal messages at the same time. But with the dynamic graphs, it is easily possible.

Finally, the graphs with edge information. As the name suggests, the edges have some additional information like weights or the type of edge. This information can help in creating the architectures like G2S encoders and R-GCN. Especially in R-GCN, which is the extension of GCN. The R means, relational. So, while working with relational data, it becomes easier to have the graph with edges that can hold additional information like the relationship between those nodes.

The types based on propagation step:

Convolutional aggregator – Graph Convolutional Network
Attention aggregator – Graph Attention Network
Gate updater – GRU, LSTM
Skip connection – Highway GNN, Jump knowledge network
Hierarchical graph – ECC

The propagation steps do allow for versatility which is evident from the types mentioned. To understand these types of GNNs, it would be simpler to compare them to traditional neural networks. It is important to remember that the propagation step consists of aggregating the neighboring node values. So, the main difference may also be called as aggregators.

First, convolutional aggregator. This is similar to the convolutional neural network. So, these networks work with image data. The fundamental idea remains the same. The high-level data is slowly convoluted into lower size data. GCN works with the same core idea. With two options, one with spectral representations and second with spatial representations. The other formats of GCN in the spectral domain are AGCN (Adaptive-GCN) and GGP, whereas in the spatial domain we have DCNN and DGCN.

In the same way, attention networks can be related to attention aggregators which are a fundamental concept of Graph Attention Networks and Gated Attention Networks. In the gate updaters, the core blocks are like the GRU and LSTM networks. Using the GRU, we make the Gated Graph Neural Network (GGNN). With the LSTM blocks, we can build architectures like Graph LSTM, which can be further divided into Tree LSTM, Sentence LSTM, and Graph LSTM.

With the skip connection networks, we build the architectures with core ideas parallel to residual networks. The architectures are Jump knowledge network, which can be understood by the name itself and the Highway GNN. The hierarchical graph architectures include the Edge-conditioned convolution (ECC) networks. It uses an edge-information graph so that the information can be conditioned to something useful. The same is then used for the computations related to propagation.

The types based on training methods:

Neighborhood sampling – FastGCN, GraphSAGE
Receptive field control – ControlVariate
Data augmentation – Co-training, self-training
Unsupervised learning – GAE, GCMC

The original GCN lacked the ability for inductive learning. To overcome this, we used neighborhood sampling architectures. The algorithm GraphSAGE is a comprehensive improvement over the original GCN. To make the inductive learning adaptable, the algorithm replaced the full graph Laplacian with learnable aggregation functions.

With the receptive field control, the ControlVariate architectures appended the stochastic approximation algorithms for GCN, which utilizes the historic activations of the nodes as a control variate. With data augmentation, we have two network architectures, co-training, and self-training. With the limitation of the requirement of large labeled data for training a GCN, the authors proposed these two architectures. The co-training utilizes the power of k-means for the neighbors in the training data and self-training follows a boosting based architecture.

Unsupervised learning was proposed to take out the problem of the requirement of a large labeled dataset for training. The architectures include Graph Autoencoders (GAE) and GCMC, which follows a Monte-Carlo based approximation. The graph autoencoders first use the GCNs to encode the nodes in the graph and then use a decoder to compute the adjacency matrix for computing the loss and taking the training further.

Conclusion

So, this is how based on the limitations of the traditional graph-based algorithms, we tweak the fundamental blocks to build better architectures. We do know graph-based algorithms are always useful in structural scenarios but now with these architectures, we can successfully train the networks for non-structural scenarios too. With the improvements over traditional networks, it is now possible to efficiently train the models for advanced applications like semantic segmentation, sequence labeling, and event extractions.

With this article at OpenGenus, you must have the complete idea of Variants of Graph Neural Networks (GNN). Enjoy.

Variants of Graph Neural Networks (GNN)

Machine Learning (ML)

A Deep Learning Approach for Native Language Identification (NLI)

Number of times characters of a string is present in another string