As increasingly complex Artificial Intelligence research makes increasing demands on computer processing power, more and more tech companies are seeking ways to improve hardware performance. Nvidia’s latest play is Volta.
Each May, Nvidia hosts the GTC (GPU Technology Conference) in San Jose. The conference introduces technology breakthroughs and new products, and showcases application and software solutions. At this year’s GTC, Nvidia took a huge stride in pushing the boundaries of AI computing.
Clearly, Huang and Nvidia are serious about revolutionising AI computing, and to this end are keen to continue pushing their GPU technology to the next level. Volta is said to be the most powerful GPU computing architecture ever. It comprises no fewer than 21 billion transistors, delivering the equivalent performance of 100 CPUs for deep learning. It’s five times more powerful than Pascal, its predecessor.
“I just put three billion dollars in my pocket,” joked Huang in reference to the investment Nvidia has made in the Tesla V100. Nvidia is betting big on Volta, and a look at the performance specs suggests they have good reason to do so.
Volta Tensor Cores
Volta arrived but one year after Pascal’s release — usually there is a two-year period between GPU architecture generations. This testifies to Nvidia’s desire to keep pace with AI’s increasing demands, as more AI-powered products, applications and solutions emerge in industries like finance, medical healthcare, transportation, and robotics.
Although Pascal has performed well in deep learning, Volta is far superior because it unifies CUDA Cores and Tensor Cores. Tensor Cores are a breakthrough technology designed to speed up AI workloads. The Volta Tensor Cores can generate 12 times more throughput than Pascal, allowing the Tesla V100 to deliver 120 teraflops (a measure of GPU power) of deep learning performance.
Although the Tesla V100 will not be released until autumn 2017, its advances have already impacted Nvidia’s AI deployment strategy.
Expanding AI deployments
Last year, Nvidia released the Pascal-based DGX-1 system, used in data centres for AI research. The first generation DGX-1, an AI supercomputer in a box, was notable for its a small size and a big capability equivalent to hundreds of CPUs.
The release of the DGX-1 system was hugely successful in pushing a wide range of AI deployments for enterprises, research organizations and cloud service providers. Over just three months, data centers’ share of Nvidia’s revenue climbed by more than 30%, from $296 to $409 million.
Volta will enhance and expand Nvidia’s family of products, including the Volta-based DGX-1 and DGX station supercomputers, a Volta-based Hyperscale Inference, a Nvidia GPU cloud, and a HGX-1 for GPU cloud computing.
The new Volta-powered DGX-1 leapfrogs its previous version with significant advances in TFLOPS (170 to 960), CUDA cores (28,672 to 40,960), Tensor Cores (0 to 5120), NVLink vs PCIe speed-up (5X to 10X), and deep learning training speed (1X to 3X).
Nvidia touts its DGX Station — a new brother of the DGX-1 — as the world’s first AI supercomputer designed for research labs and offices. Incorporating four Tesla V100, water cooling, a new generation NVLink connection, an INTEL XEON CPU, and three display ports, the DGX Station provides users with ease of experimentation and low noise performance.
A Nvidia system specifically targeting web service companies is Hyperscale. The Tesla V100-compatible Hyperscale’s latest version provides a 15-25X inference speedup over Intel Skylake.
Another exciting development is the Nvidia GPU Cloud (NGC), something Nvidia customers have long been waiting for. The cloud-based platform will enable developers to train deep learning models on PCs (equipped with a TITAN X or GeForce GTX 1080 Ti), NVIDIA DGX systems, or the cloud.
The HGX-1, meanwhile, takes aim at cloud computing for deep learning, graphics, or CUDA computation. The HGX-1 is equipped with eight Tesla V100, a NVLink Hybrid Cube, and three configurations of CPUs and GPUs (2C:8G, 2C:4G, 1C:2G).
Seeking solutions in inferencing
Nvidia has been a leading company in AI computing in large part due to the rise of GPUs since AlexNet won ImageNet in 2012. But GPUs are not magic bullets – they suffer from high latency and high energy costs in inferencing. GPUs are great at training, but not as competitive in inference.
This is why Volta might be the answer. Said Huang in his keynote speech, “Volta is groundbreaking work, incredibly good at training and incredibly good at inferencing.”
Huang also introduced the TensorRT (Run Time) for TensorFlow, which accelerates training by 12x and inferencing by 6x. He described inferencing performance in terms of throughput and latency on ResNet-50, measured in terms of images per second. V100 can process more than 5,000 images per second, whereas the P100 can only manage 600 and Intel’s Broadwell CPU just 100.
Energy cost in unconnected devices
Another key feature of Volta is energy efficiency, which is critical for unconnected devices such as cars.
Last year, Nvidia unveiled an AI supercomputer for autonomous cars dubbed Xavier, a processor integrating CPU, CUDA GPU and deep learning accelerators. Following Volta’s debut, Xavier upgraded its GPU core from Pascal to CUDA Volta, significantly reducing energy costs.
“Clearly, Nvidia cares a lot about energy efficiency in neural networks. I think Xavier is going to be really exciting in running complicated neural networks with lower latency and better devices,” said Nvidia Vice President Bryan Catanzaro.
Another exciting announcement was Nvidia’s decision to open source Xavier’s DLA, with early access coming in June 2017 and full access in September.
Volta’s world of potential
Volta is driving AI in terms of deployment, inference improvement and reduced energy cost. Instead of an upgraded Pascal, Volta is a revolutionary GPU architecture. In this light, we can begin to glimpse other innovations that Volta and Volta-based GPU might bring to the world.
Catanzaro said he is excited to see how Volta will influence algorithms in AI. “What I expect to happen to AI is that people are going to try models they could not try before, that take a lot more (tera)flops. My personal expectation is that we are going to see architecture shift a little bit to take advantage of the flops that we have. This may shift the way people are designing their models.”
HPC (High Performance Computing) is another good example of Volta’s potential. Because of Volta’s powerful computing capability, scientists are expected to create more AI applications to solve problems in HPC.
As Huang and the tech giants note, the world is getting to the point where Moore’s Law — the capability-doubling theory which has enabled the world to advance micro-processing architecture for decades — is coming into conflict with the laws of physics with regard to CPUs. Everyone is looking for new solutions in a post-Moore’s Law era. Nividia is all-in on GPUs, which are seemingly exempt from Moore’s Law. Moreover, if GPUs move to the cloud, it will become unnecessary to consider limits on the numbers of cores in a single physical space. GPUs will have unlimited space to grow.
No one knows whether GPUs will be the final answer, but at least in a short term, they are the major force accelerating AI computing. And Volta is at the vanguard.
Author: Tony Peng, Synced Tech Journalist | Editor: Michael Sarazen