New research from Cornell University, Facebook and Facebook AI proposes a high-accuracy graph learning method that is fast to train and outperforms big Graph Neural Network (GNN) models. Their “Correct and Smooth” (C&S) is a general pipeline that achieves SOTA performance on node classification tasks.
Although GNNs are the predominant technique for learning over graphs representing complex systems such as social networks, elections maps or transportation grids, the researchers note a scaling problem seen across ML: “GNN models are becoming more expressive, more parameterized, and more expensive to train.“
In the paper Combining Label Propagation and Simple Models Out-Performs Graph Neural Networks, the team explains their performance improvements are mostly due to directly using labels in the learning algorithm. C&S combines simple models that ignore the graph structure and two simple post-processing steps that exploit correlations in the label structure.
How do GNNs work? In a graph, entities are represented as vertices or nodes, and interactions are edges connecting two vertices. Accordingly, an attributed graph would further demonstrate attributes of interest for each vertex. However, some attribute information might be missing on a subset of vertices. GNNs can predict these missing attributes, thanks to their ability to extract information from vertex features and transform features in each vertex’s neighbourhood into a vector representation of the vertex.
Paper co-author and Cornell University Assistant Professor Austin Benson says the new work suggests “you don’t need GNNs to cleverly learn to combine your neighborhood information. You just need smoothing (coming from label prop ideas).“
The researchers built a simple pipeline with three main stages:
- a base prediction made with node features that ignores the graph structure (e.g., an MLP or linear model)
- a correction step, which propagates uncertainties from the training data across the graph to correct the base prediction
- a smoothing of the predictions over the graph
The team examined the effectiveness of their method on datasets from various benchmarks:
- The Arxiv and Products datasets from the Open Graph Benchmark (OGB)
- The Cora, Citeseer, and Pubmed, three classic citation network benchmarks
- wikiCS, a web graph
- A Facebook social network of Rice University
- A geographic dataset of US counties
- An email dataset of a European research institute
The results showed the C&S approach can match or exceed the performance of SOTA GNNs. On the Products dataset for example, the framework with a linear base predictor model scored higher accuracy while training over 100 times faster and with 137 times fewer parameters compared to the SOTA GNN model. “The performance of our methods highlights how directly incorporating label information into the learning algorithm (as was done in traditional techniques) yields easy and substantial performance gains,” the team says.
The paper Combining Label Propagation and Simple Models Out-Performs Graph Neural Networks is on arXiv, and the code for the OGB dataset results can be found on the project GitHub.
Reporter: Fangyu Cai | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Thinking of contributing to Synced Review? Synced’s new column Share My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.