The ability of graph transformers (GT) to model long-range interactions has improved expressivity and made such architectures a promising alternative to traditional graph neural network (GNN) approaches. The GT downside is their poor scalability, which renders them prohibitively expensive when dealing with large and complex graphs.
In the new paper Exphormer: Sparse Transformers for Graphs, a research team from the University of British Columbia, Google Research and the Alberta Machine Intelligence Institute proposes Exphormer, a framework that equips sparse GTs with impressive scalability, reduces computational complexity to linear, and achieves state-of-the-art performance on graph benchmarks.
The team summarizes their main contributions as follows:
- We propose sparse attention mechanisms with computational cost linear in the number of nodes and edges.
- We introduce Exphormer, which combines the two techniques for creating sparse overlay graphs.
- We also introduce expander graphs as a powerful primitive in designing scalable graph transformer architectures.
- We are able to show that Exphormer, which combines expander graphs with global nodes and local neighbourhoods, spectrally approximates the full attention mechanism with only a small number of layers, and has universal approximation properties.
The proposed Exphormer is based on and inherits the desirable properties of the recently introduced GraphGPS (Rampasek et al., 2022) modular framework for building general, powerful and scalable GTs with linear complexity. GraphGPS combines traditional local message passing and a global attention mechanism while also allowing for “sparse” attention mechanisms to boost performance and reduce computation costs.
Exphormer applies an expander-based sparse attention mechanism to GTs, constructing an interaction graph comprising three main components: 1) Expander graph attention, which enables propagating information between nodes pairs without connecting all pairs of nodes, 2) Global attention, which adds virtual nodes to enable a global “storage sink” while also providing universal approximator functions for full transformers, and 3) Local neighbourhood attention, which models local interactions to obtain information about connectivity.
In their empirical study, the team evaluated Exphormer on graph and node prediction tasks. In the experiments, Exphormer combined with message-passing neural networks (MPNN) in the GraphGPS framework achieved state-of-the-art results on several benchmark datasets, surpassing all other sparse attention mechanisms and remaining competitive with dense transformers that have far more parameters.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.