AI and deep learning have become ubiquitous in the modern world. The increasing development and deployment of high-performance models however has revealed a concerning downside: their high price in terms of computational requirements and energy consumption.
In a bid to make machine learning (ML) models more energy efficient and reduce environmentally harmful CO2 equivalent (CO2e) emissions, a research team from Google and the University of California, Berkeley recently examined the energy use and carbon footprint of popular large-scale models T5, Meena, GShard, Switch Transformer and GPT-3. In the paper Carbon Emissions and Large Neural Network Training, the team introduces reduction strategies and endorses previous appeals for publication norms designed to make the energy use and CO2e emissions of computationally intensive ML models more transparent.
The researchers identify several opportunities for improving energy efficiency and reducing CO2e emissions:
- Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters.
- Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e can vary by ~5X-10X, even within the same country and the same organization. It is possible to optimize where and when large models are trained.
- Specific datacentre infrastructure matters, as Cloud datacentres can be ~1.4-2X more energy efficient than typical datacentres, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems.
The team’s investigation of CO2e emissions surprisingly discovers that with prudent processor, hardware and datacentre choices, it is possible to reduce the carbon footprint of full deep neural networks (DNNs) by up to ~100 to 1000 times.
The researchers first designed a simplified formula for calculating an ML model’s carbon footprint that considers five factors: the program that implements it; the number of processors that run the program; the speed and power of those processors; a datacentre’s efficiency in delivering power and cooling the processors; and the energy supply mix (renewable, gas, coal, etc.).
They then look at four factors that can contribute to carbon footprint during model training — algorithm/program improvement, processor improvement, datacentre improvement and energy mix improvement — and summarize their associated business rationale:
- Algorithm/program improvement: Training faster saves ML researchers time, saves their organization money, and reduces CO2e.
- Processor improvement: Customized hardware such as tensor processing units (TPUs) can reduce operating costs, including electricity consumed and amortization of capital expenditures for the computer, cooling, power distribution and the building, resulting in performance gains and reduction of costs and CO2e emissions.
- Datacentre improvement: Cloud companies prefer energy-efficient datacentres since they save money and lower emissions.
- Energy mix improvement: Cloud computing allows large companies like Google to maintain a global portfolio of datacentres, enabling them to purchase local clean energy directly.
The researchers explore the impact of these choices on five large-scale natural language processing (NLP) models. They test Google’s T5, Meena, GShard and Switch Transformer; and Open AI’s GPT-3, which runs on the Microsoft Azure Cloud.
The results demonstrate that improving the energy efficiency of algorithms, datacentres, hardware and software can make training on large NLP models much more efficient.
Additional factors relating to carbon emissions are also addressed in the paper, such as how neural architecture search (NAS) can lead to improved energy efficiency in the training of downstream applications, with benefits that dramatically outweigh the cost; and how resources such as the ML Emissions Calculator currently working on CodeCarbon can be used to automatically measure carbon consumption.
The paper proposes using standard ML algorithmic techniques such as distillation, pruning, quantization and efficient coding to improve energy efficiency and notes that it also matters which datacentre is used, even within the same organization. Although reducing training cost matters too, the team suggests the benefits of some large language models are worth the cost. For instance, the COVID-19 Research Explorer, powered by BERT, helps scientists and researchers efficiently pore through a massive corpus of articles for answers to COVID-19-related questions.
The team also notes that measurements are more interesting than extrapolations. For example, theoretical performance per watt is typically higher than measured performance per watt by factors of 1.6X for TPUs and by 3.5X for GPUs.
The paper’s overarching message is that pooling various efficiency improvements will truly benefit the enterprises using greener ML models. To this end, energy consumption should be included in future MLPerf benchmarks to guide the ML community, as their understanding and expertise will be key to designing and developing systems that reduce CO2e emissions.
The paper Carbon Emissions and Large Neural Network Training is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.