While the power and performance of today’s large language models (LLMs) are beyond anything previously seen from AI, so too are their massive computational requirements. Spiking neural networks (SNNs) — which only transmit relevant information when a neuron’s threshold is met — have emerged as an energy-efficient alternative to traditional artificial neural networks. SNNs however have yet to match the performance of deep neural networks (DNN), and their effectiveness on language generation tasks remains unexplored.
In the new paper SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks, a research team from the University of California and Kuaishou Technology presents SpikeGPT, the first generative spiking neural network (SNN) language model. The team’s 260M parameter version achieves DNN-level performance while maintaining the energy efficiency of spike-based computations.
The team summarizes their main contributions as follows:
- We provide the first demonstration of language generation using direct SNN training;
- We achieve performance comparable to that of ANNs, while preserving the energy efficiency of spike-based computations;
- We have successfully combined the powerful Transformer architecture with SNNs, without the need for additional simulation time steps, by utilizing linearization and recurrent Transformer blocks.
SpikeGPT is a generative language model with pure binary, event-driven spiking activation units. Inspired by the RWKV RNN language model, SpikeGPT integrates recurrence into a transformer block to make it compatible with SNNs, eliminates the quadratic computational complexity, and enables the representation of words as event-driven spikes.
Moreover, combining recurrent dynamics with linear attention makes it possible for the SpikeGPT network to process streaming data in a word-by-word manner, commencing computation before the given words form a sentence while retaining the long-range dependencies in complex syntactic structures.
The team also introduces various techniques to boost SpikeGPT’s effectiveness: 1) A binary embedding step converts continuous outputs of the embedding layer into binary spikes to maintain consistency in the SNN binary activations, 2) A token shift operator combines information from the global context with the original token’s information to endow the token with better contextual information, and 3) A vanilla RWKV replaces the conventional self-attention mechanism to reduce computational complexity.
In their empirical study, the team trained SpikeGPT with three varying parameter scales (45M, 125M, and 260M parameters) and compared it with transformer baselines such as Reformer, Synthesizer, Linear Transformer and Performer on the Enwik8 dataset. In the evaluations, SpikeGPT achieved comparable results with 22x fewer synaptic operations (SynOps).
Overall, this work demonstrates how large SNNs can be trained to leverage advances in transformers via the proposed serialized version of their attention mechanisms and advances the possibility of significantly reducing LLMs’ compute burden by applying event-driven spiking activations to language generation. The researchers intend to continue testing and tweaking their model and will provide updates in their preprint paper.
The code implementation is available on the project’s GitHub. The paper SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Introducing SpikeGPT: UCSC & Kuaishou’s LLM With Spiking Neural Networks Slashes Language Generation Costs”