AI is booming. So is the compute used to train AI models. San Francisco-based non-profit AI research institute OpenAI yesterday released an analysis on the largest AI models since 2012. Researchers discovered that the amount of compute used for training doubles every 3.5 months. By comparison, the number of transistors per square inch on integrated circuits only doubles every 18 months, a.k.a Moore’s Law.
The analysis also showed that DeepMind’s 2017 AlphaZero, which mastered the game of Go after mere hours of training with no human game records to reference, ran over 300,000 times more compute than AlexNet, the breakthrough GPU-powered convolutional neural network (CNN) that won the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge).
The results suggest that more compute can achieve better performance in AI models, and is often complementary to algorithmic advances.
The analysis also suggests that the exponential increase of compute for AI training can be largely attributed to hardware development. Before 2012, AI researchers had merely co-opted GPUs for machine learning, but after 2016 specialized processors such as TPUs and more advanced interconnect techniques that enable large-scale parallel computing allowed researchers to train AI models with better compute capability.
OpenAI researchers adopted two methods to measure the compute directly counting operations used in a forward pass, or calculating operations based on numbers of GPUs deployed for training. For example, an AlexNet paper revealed that the network takes “between five and six days to train on two GTX 580 3GB GPUs”, so researchers could estimate an approximate amount of compute of 0.0058 petaflops per day.
Synced collected comments from AI scholars and journalists on OpenAI’s new analysis:
I’m not sure you can conclude that AI capabilities will always advance in step with compute power. That assumes all you need is more data and GPU horsepower.
— Will Knight, MIT Technology Review’s Senior Editor for Artificial Intelligence.
I think it’s totally backwards. Techies love playing with bigger toys, so they’ll try to use whatever they can get their hands on. The chart is showing the increasing availability of resources to DL researchers — that’s all.
— Jeremy Howard, Previously Kaggle President and Chief Scientist, and now Fast.ai Founding Researcher
If there is an underlying technique (architecture search, self-play, …) which can utilize vast compute effectively for a given real task, the amount of compute available for this purpose is astronomical.
— Stephen Merity, Previously Salesforce Senior Research Scientist
This chart via @OpenAI is amazing on its own, but the most interesting part imo is how much of an outlier DQN remains.
— Dave Gershgorn, Quartz Reporter for Artificial Intelligence
Journalist: Tony Peng| Editor: Michael Sarazen