China Community United States

Make AI Computing 100 Times Faster

“We want to be able to train a machine learning model, like Google Translate, in a few hours,” said Prof. Hwu. Best known for enabling the parallelism capability of the GPU to solve scientific problems like AI.

Chinese AI talent in US – UIUC Professor Wen-Mei Hwu

The world is at the point where virtually all technologies rely on computing. And nowhere is the importance of computing more pronounced than in the arena of artificial intelligence. The adoption of GPU (graphic processing units) in general purpose processing enabled AlexNet, a convolutional neural network (CNN) running on GPUs implemented in CUDA, to win the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge, aka the Olympics of computer vision), by a huge advantage in accuracy.

And now, everything is AI-ed. The cutting edge technology came of age in 2016, and its impact has intensified this year. However, the AI revolution still has a long way to go. Even Microsoft admitted the technology is not yet mature enough. Why? Current computing capabilities cannot keep up with the cost and energy demands made by increasingly complicated AI operations.

The whole industry is going deep in hardware that accelerates the training speed of machine learning models. Google, for example, built the Tensor Processing Unit, or TPU. Microsoft adopted the field programmable gate array, or FPGA. Meanwhile, Intel made a progressive move with the acquisition of Altera, a manufacturer of programmable logic devices, for $16.7 billion.

IBM is seeking collaboration with academia, and last year announced a multi-year cognitive computing research project with the University of Illinois at Urbana-Champagne (UIUC). Undoubtedly, the industry leader came for Professor Wen-Mei Hwu, one of the foremost experts in parallel computing at UIUC, someone with 10 years’ experience in GPU and 30 years’ experience in computer architecture. IBM’s ambitious objective is to increase machine training speed by 100 times.

UIUC Professor Wen-mei Hwu. Credit by UIUC CSSA

“We want to be able to train a machine learning model, like Google Translate, in a few hours,” said Prof. Hwu. Best known for enabling the parallelism capability of the GPU to solve scientific problems like AI, the Taiwanese-born professor plans to push the capability of multi-core computing to the next level.

Things were different in the 1980s

Back in the 1980s, as Prof. Hwu recalls, Taiwan was slow in developing computer technology. At National Taiwan University where Prof. Hwu earned his bachelor’s and master’s degrees in computer science, there was only one computer available for students studying programming. Hundreds of students had no choice but to line up every day to submit codes in exchange for reports of programming results.

“It was so sad,” says Prof. Hwu, who responded with a great solution: he assembled 100 computers using motherboards from the Apple II, a classic personal computer of the 1980s. “The factory for Apple II was based in Taiwan, and there were a large number of motherboards flowing into the black market,” he explains.

He set up a computer room, and the students were exhilarated. They soon made it the most popular place on campus. It was the first time Prof. Hwu had exploited the huge potential of computer technology. He was supposed to learn solid state physics, but changed his mind and switched focus to computer science. After graduation, he was admitted to UC Berkeley to study Computer Science.
Meanwhile, the AI boom was starting. In the 1980s, Japan aggressively funded AI with its fifth generation computer project. Their objectives were to write programs and build machines that could carry on conversations, translate languages, interpret pictures, and reason like human beings. In response, the US government funded multiple projects at top universities including UC Berkeley. Given the research budget, Prof. Hwu and his mentor Prof. Yale Patt, a renowned computer architect, along with two other PhD students of Prof. Patt, Stephen Melvin and Michael Shebanow, proposed a micro-architecture called HPS for computers. The work was part of the Aquarius project, directed by Prof. Patt and Prof. Alvin M. Despain. The project studied Prolog, a general-purpose logic programming language associated with artificial intelligence and computational linguistics.


Prof. Yale Patt (left) and Prof. Wen-mei Hwu (right)

While Prof. Despain and his students focused on Prolog, Prof Patt and his students focused on the fundamental problems of micro-architecture, in order to provide a structure that Prolog and other AI languages could run faster on. Prof. Hwu’s PhD thesis discussed interrupt recovery, and created the reorder buffer and checkpoint recovery protocols. This provided one of the most important breakthroughs that changed the way microprocessors have been built since – the ability to retire in-order instructions that had been executed out-of-order.

“Out-of-order execution first showed up in the mid-1960s in the floating point unit of the IBM 360/91. But the 360/91 had many limitations, which we corrected with HPS in 1985. It is noteworthy that no one had any follow-on products of this kind after the IBM 360/91, until we came along and solved those limitations. One of the most important things we provided was in-order retirement, the hallmark of Wen-mei’s thesis,” says Prof. Patt.

After graduation in 1987, Prof. Hwu led a UIUC research group called IMPACT (Illinois Micro-architecture Project using Algorithms and Compiler Technology), which has since delivered new compiler and computer architecture technologies to the computer industry. “It was the time when Intel was working on superscalar processors, and they adopted some of our technologies in industrial applications,” says Prof. Hwu. In the 1990s, Prof. Hwu received various awards for his contributions to the industry, including the Maurice Wilkes Award, a top prize in computer architecture. He is also a fellow at IEEE and ACM.

Revolutionising GPU for 10 years

From 1980s to 2000s, the CPU-based microprocessor drove the speed of application and performance improvements in software functionality. But in 2000, Prof. Hwu realized the increase in clock frequency and level of productive activities that can be performed in each clock period within a single CPU were slowing down. So he turned to research on parallel computing. Soon, Intel and Nvidea approached Prof. Hwu, looking for an advanced algorithm that could enhance multi-core processing.

Intel proposed a dual-core micro-processor, while Nvidea provided a GPU-centered supercomputing proposal, with 16 cores. The professor’s choice was easy — he began collaborating with Nvidea’s Chief Scientist David Kirk, who made huge contributions in the development of GPU and CUDA, a parallel computing program enabling GPU for general purpose processing. It was a productive time for Prof. Hwu. He was selected as the principal investigator of the first Nvidea CUDA Center of Excellence at UIUC; and published with David Kirk a tutorial masterpiece called Programming Massively Parallel Processors, a guide for students learning the construction of parallel programs.


“What a quick ten years it’s been,” says Prof. Hwu, “I have learned a lot about parallel computing.”

AlexNet’s huge success in 2012 opened the era of neural networks, and the adoption for GPUs inspired the subsequent implementation of microprocessors in AI. “Without GPUs, AlexNet might have spent two more years in data-training,” says Prof. Hwu.

Says Prof. Patt: “Wen-mei (Hwu) showed how to harness the enormous parallelism capability of GPUs for solving classical problems in physics, chemistry, and other sciences. Many have jumped on the GPU algorithm bandwagon since. Machine Learning was among the first research fields to implement Wen-mei’s insights.”

GPUs have rapidly improved over the last 10 years, and machine learning models like Google Translate and Surmind can now be generated in as little as two weeks. However, machine training remains relatively sluggish. And so, Prof. Hwu plans to spend the next two years on algorithms designed to enable next generation hardware to shorten this training time.

“Tech companies are now hungry for talent in computer architecture because their technology requires better hardware designs to achieve cost minimization,” says Prof. Hwu. With regard to the traditional preference of computer science students for software over hardware, Hwu says these days, with tech companies’ focus evolving to include microprocessors, analog and parallel computing, “hardware is getting hot again.”

Breaking the Cultural Barrier

Prof. Hwu recently gave the keynote speech at the UIUC US-China Innovation & Development Forum, and shared his insights about innovation in computational AI. The professor has been giving presentations and lectures for 31 years. He is well-accustomed to standing in front of hundreds of students.

UIUC Professor Wen-mei Hwu. Credit by UIUC CSSA

However, with his cultural background, communication was not always easy for Prof. Hwu. When he landed in the United States he could barely speak English, and the biggest challenge was giving clear and persuasive presentations. Asian students are not usually inclined (or even allowed) to argue with their teachers, but American students were nurtured with critical thinking, presentation and debate skills. Back in the day, Prof. Hwu was not accustomed to the culture. “It might be one of the reasons why Chinese professor were rare in the US back in 1980s.”

“Chinese professors are typically much more quiet, deferential, and humble,” says Prof. Jian Ma, an alumnus of the elite Fudan University and Penn State PhD, and now an associate professor at the School of Computer Science at Carnegie Mellon University. Prof. Ma taught at UIUC for six years and knew Prof. Hwu well. Regarding the challenges of cultural acclimatization, Prof. Ma suggests Chinese professors should not hesitate to speak up more.

Things are changing. Chinese researchers are now a major driving force in AI research, from theory to application. With the boundaries between different disciplines blurring and many of the most exciting areas emerging at the intersection of different research fields, it seems that more and more younger Chinese professors are working in interdisciplinary areas.

“Younger generation Chinese academics like me still have a lot to learn from more senior Chinese colleagues like Prof. Hwu on how to successfully thrive in the academic environment,” says Prof. Ma.

It is no doubt Prof. Hwu has achieved great success in academia so far. If you look at the professor’s bio on Wikipedia, you might get lost in his titles – Chief Scientist of Parallel Computing Institute at UIUC, Sanders-AMD Endowed Chair in Electrical and Computer Engineering, PI of the first CUDA excellence center, and on and on. Tens of thousands of students have profited from his course Programming Massively Parallel Processors and his course text book, published in 2010. The book has been extremely popular, with more than 10,000 copies sold to date.

The accomplished 55-year-old Professor is in the prime of his career ( 50-60s is the golden age for scientists), and seems poised to further revolutionise the world of computational AI with even more stunning achievements in the near future.


Feaure image credit by UIUC CSSA

Author: Tony Peng, Synced Tech Journalist | Editor: Michael Sarazen

0 comments on “Make AI Computing 100 Times Faster

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: