These days the total cost of training an NLP model can climb into millions of dollars. And so it is only natural that budget-constrained researchers, engineers and scientists, when planning their model-training experiments, ask the important question: How much is this going to cost? And what are the main factors affecting that price tag?
Israeli research company AI21 Labs looks for answers in their recently published paper The Cost of Training NLP Models: A Concise Overview.
AI21 Labs Co-CEO, Stanford University Professor of Computer Science (emeritus), and AI Index initiator Yoav Shoham describes the motivation for the project. “It started with an inquiry we got at the AI Index. I started jotting down a quick answer and realized it deserved a longer one. I also realized we had a lot of the expertise at AI21 Labs. So we spun up a small effort to put this report together, to benefit the community.”
The team compared three different-sized Google BERT language models on the 15 GB Wikipedia and Book corpora, evaluating both the cost of a single training run and a typical, fully-loaded model cost. The team estimated fully-loaded cost to include hyperparameter tuning and multiple runs for each setting: “We look at a somewhat modest upper bound of two configurations and ten runs per configuration.”
- $2.5k – $50k (110 million parameter model)
- $10k – $200k (340 million parameter model)
- $80k – $1.6m (1.5 billion parameter model)
Training costs can vary drastically due to different technical parameters, climbing up to US$1.3 million for a single run when training Google’s 11 billion parameter Text-to-Text Transfer Transformer (T5) neural network model variant. A project that might require several runs could see total training costs hit a jaw-dropping US$10 million.
The researchers note that the cost for floating-point operations (FLOPs) and basic neural network operation for example have been falling, while the rise in overall costs is being driven by increasing dataset and model size and training volume: “In NLP, everything is big and getting bigger.”
Precise cost estimates based on particular NLP models or task training procedures are however difficult. As Shoham told Synced “It’s influenced by multiple parameters. Besides the technical parameters, there are also personal and organizational considerations; a researcher might be impatient to wait three weeks to do a thorough analysis, and his/her organization may not be able to, or wish to pay for it. So for the same task one could spend $100k or $1 million.”
The study says the reason for NLP’s current mega-scale, brute-force statistical approach and SOTA leaderboard races is simple: “It works; it has yielded better performance than any alternative.”
Shoham suggests however that it’s time for a change, as chasing leaderboard rankings on narrow tasks often leads to over-optimizing for the challenge set and overfitting. “The community knows this,” he says.
Instead of devoting more compute to leaderboard races, Shoham proposes the research community look into more efficient neural network architectures and how to take better advantage of data.
The paper The Cost of Training NLP Models: A Concise Overview is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen