AI Computer Vision & Graphics Machine Learning & Data Science Research

Graph Kernel Attention Transformers: Toward Expressive and Scalable Graph Processing

A research team from Google Brain, Columbia University and University of Oxford proposes Graph Kernel Attention Transformers (GKATs), a new class of graph neural network that achieves greater expressive power than SOTA GNNs while reducing computation burdens.

The ability of graph neural networks (GNNs) to deal with graph data structures has made them suitable for real-life applications in social networks, bioinformatics, and navigation and planning problems in robotics. But despite their growing popularity, GNNs are not without their limitations, which include processing efficiency, the high computational complexity problem, and quadratic memory requirements for dense graphs.

To solve these issues, a research team Google Brain, Columbia University and University of Oxford has proposed a new class of graph neural networks, Graph Kernel Attention Transformers (GKATs), which combine graph kernels, attention-based networks with structural priors, and recent transformer architectures that apply small memory footprint implicit attention methods via low-rank decomposition techniques. The team demonstrates that GKATs achieve much more expressive power than SOTA GNNs while also reducing computation burdens.

The paper Graph Kernel Attention Transformers starts by posing the question: “Is it possible to design GNNs with densified individual layers modelling explicitly longer range node-to-node relationships in a graph, that enable more shallow architectures and at the same time scale up to larger (not necessarily sparse) graphs?”

The researchers then summarize their GKAT approach’s advantages over previous methods:

  1. GKATs are capable of modelling dependencies between distant nodes within a single attention layer.
  2. GKATs attention layers scale linear time & memory, not requiring explicitly storing graph representations (such as adjacency matrices) to compute attention layers if a graph kernel has finite (at least on expectation) (random) feature representation.
  3. GKATs are highly flexible since they can be used with various graph kernels.

The proposed GKATs were inspired by recent research on dense linear attention transformers, where studies have shown that kernel-based approaches are very effective over sparse attention layers. Following this path, GKATs model graph attention within each layer as a Hadamard product of the kernel matrix of the nodes’ feature vectors and a graph kernel matrix. This enables GKATs to leverage computationally efficient implicit attention mechanisms and model longer-range dependencies within a single layer, boosting their expressive power beyond that of traditional GNNs.

To define expressive kernels on graph nodes that lead to efficient mappings, the researchers employ a novel Random Walks Graph-Nodes Kernels (RWGNKs) approach, where the value for two nodes is given as a dot-product of two frequency vectors that record visits of random walks in graph nodes.

The full GKAT architecture consists of several blocks, with each block built from the attention layer and a standard MLP layer. The attention layers scale linearly rather than quadratically with the number of nodes of the input graphs, thus reducing computational complexity compared to their regular graph attention counterparts.

To evaluate GKAT with RWGNK kernels, the team conducted experiments ranging from purely combinatorial to bioinformatics tasks. They compared GKAT with graph convolution networks (GCNs), spectral graph convolution networks (SGCs), graph attention networks (GATs), etc.

On the motif-detection task, GKAT outperformed all other methods for all motifs. GKAT also showed the best performance on three out of four bioinformatics datasets and was among the top-two methods on four out of five social network datasets. In a time complexity testing experiment, GKAT was faster than its GCN, GAT and SGC counterparts.

Overall, the proposed GKATs demonstrate superior performance across a wide range of tasks, from purely combinatorial problems through social network data to bioinformatics challenges, indicating their ability to boost expressive power while reducing computational costs.

The paper Graph Kernel Attention Transformers is on arXiv.

Author: Hecate He | Editor: Michael Sarazen, Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Graph Kernel Attention Transformers: Toward Expressive and Scalable Graph Processing

  1. Pingback: r/artificial - [R] Graph Kernel Attention Transformers: Toward Expressive and Scalable Graph Processing - Cyber Bharat

Leave a Reply

Your email address will not be published. Required fields are marked *