A new study introduces a reproducible graph neural network (GNN) benchmarking framework to study and quantify the impact of theoretical developments for GNNs. In the field of analyzing and learning from data on graphs, GNNs have become an essential tool. With promising applications in different domains such as chemistry, physics, social sciences, knowledge graphs, recommendation, and neuroscience, how to study and build more powerful GNNs is a hot topic.
Without a standardized benchmark, it’s hard even to define what constitutes a “powerful” GNN. In the paper Benchmarking Graph Neural Networks, researchers propose a flexible GNN benchmarking framework that can also accommodate the needs of researchers to add new datasets and models. The team includes Yoshua Bengio and researchers from Nanyang Technological University, Loyola Marymount University, Mila, University of Montreal, and CIFAR.
Designing a fair benchmark first requires defining representative, realistic and large-scale datasets. Researchers rejected the popular CORA and TU datasets. One of the authors, Professor Xavier Bresson explained “Our goal was to identify trends and good building blocks for GNNs. Such analysis was not possible with small CORA and TU datasets (all GNNs perform the same as well as non-graph NNs).” Although most previously published works have focused on small datasets — CORA and TU have only a few hundred graphs — it was inevitable that researchers would encounter limitations with these.
Researchers proposed a collection of medium-scale datasets with 12k-70k graphs of variable size 9-500 nodes from mathematical modelling (Stochastic Block Models), computer vision (super-pixels), combinatorial optimization (Traveling Salesman Problem) and chemistry (molecules’ solubility) to examine different GNN architectures for clear and statistically meaningful differences when comparing performance.
Another issue with small datasets is overfitting. Small datasets are handy when researchers are developing new ideas, but in the long run the design and development of more mature and advanced GNNs models will only worsen the overfitting problem. Small datasets can also be responsible for lack of reproducibility of experimental results. Without standard experimental settings such as a consensus on training, validation and test splits and evaluation protocols, it would be unfair to compare the performance of new GNNs architectures.
In this study, researchers conducted numerical experiments with the proposed open-source benchmarking framework node, edge, graph classification, and graph regression. In the analyses of graph classification with TU datasets and graph classification with SuperPixel datasets, researchers concluded that the graph-agnostic NNs performed as well as GNNs on small datasets. Furthermore, the experiment of graph regression with molecular dataset ZINC indicated, “as expected, graph NNs outperform non-graph NNs for larger datasets.” Bresson remarked, “… nothing new under the sun but it was important to show this experimentally.”
The numerical experiments also demonstrated that residual connections improve performance and are an important building block for designing deep GNNs. Examining the results of deep GNNs for test set graphs ZINC, CLUSTER, and TSP, researchers observed that when the number of layers increased the performance of all models except GIN improved. They further concluded that “graph convolution, anisotropic diffusion, residual connections and normalization layers are universal building blocks for developing robust and scalable GNNs.”
Journalist: Fangyu Cai | Editor: Michael Sarazen