A founding member of Google Brain and the mind behind AutoML, Quoc Le is an AI natural: he loves machine learning and loves automating things.
Le used millions of YouTube thumbnails to develop an unsupervised learning system that recognized cats when he was a Stanford University PhD in 2011. In 2014, he pushed machine translation performance with deep learning techniques and an end-to-end system that automatically converted words and documents into vector representations, laying the groundwork for Google’s subsequent breakthroughs in neural machine translation.

Since 2014, Le has set his sights on automated machine learning (AutoML). The process of building machine learning models essentially requires repetitive manual tuning: Researchers attempt different architectures and hyperparameters on an initial model, evaluate the performance on a dataset, come back with changes, and the process repeats toward optimization.
Le sees this as a simple trial-and-error problem that can be solved by machine learning.
In 2016, Le teamed up with a Google resident and published the seminal paper Neural Architecture Search with Reinforcement Learning. The core idea was akin to building blocks: The machine picks up the components it needs from a defined space to build a neural network, and then improves its accuracy using a trial-and-error technique, which is reinforcement learning. The result was promising, as machines generated models that matched humans’ best performance models.
Le’s research contributed with the creation of Google Cloud AutoML, a set of tools that enables developers with limited machine learning expertise to train high quality models. Unsurprisingly AutoML quickly became a popular topic, with tech giants and startups alike following Google’s footprints and betting on the new tech.

Synced recently spoke with Le. In a wide ranging video interview, the unassuming 36-year-old Vietnamese AI expert spoke on his inspiration, the tech behind and the road ahead for AutoML, and its important new role in the machine learning field. Read on for insight into the man behind so many transformative technologies. The interview has been edited for brevity and clarity.
At the upcoming AI Frontiers Conference on Nov 9 in San Jose, California, Quoc Le will give a talk on “Using Machine Learning to Automate Machine Learning,” with a special focus on Neural Architecture Search and AutoAugment.
The Inspiration
When did you start thinking about designing a new neural architecture search and what inspired you?
It goes back to around 2014 and happened gradually over time. I’m an engineer in machine learning. When you work with neural networks all the time, what you realize is that a lot of them require manual tuning, what people call hyperparameters — the number of layers in the neural network, the learning rate, and what type of layers go into these networks. AI researchers tend to start with some principles, and then over time the principles kinda break loose and they try different things. I followed some of the developments in ImageNet competitions and I saw the development of Inception networks at Google.
I started thinking but wasn’t clear on what I wanted to do. I like convolutional networks, but I don’t like the fact that the weights in a convolutional network are not shared with each other. So I thought that maybe I should develop a mechanism to actually learn how to share weights in a neural network.
As I moved along I gained more and more intuition about this and then I looked into what to do. What researchers do is they take a bunch of existing building blocks, and then they try them out. They see some accuracy improvement. And then they say, “Okay, maybe I just introduced a good idea. How about keeping the good things I just introduced but replacing the old things with something new?” So they continue in that process — and an expert in this area could try hundreds of architectures.
Around 2016, I began thinking if there’s a process that requires so much trial and error, we should automatically be using machine learning there, because machine learning itself is also based on trial and error. If you think about reinforcement learning and the way a machine learned to play the game of Go, it is basically trial and error.
I worked out how much real compute I would need to do this. My thinking was a human might require a hundred networks because humans already have a lot of intuition and a lot of training. If you use an algorithm to do this, you might be one or two orders of magnitude slower. I thought actually that to be one or two orders of magnitude slower wasn’t too bad, as we already had sufficient compute power to do it. So I decided to start the project with a resident (Barret Zoph, who is now a Google Brain researcher).
I didn’t expect that it would be so successful. I thought that the best we could do was maybe 80 percent of human performance, and that would be a success. But the resident was so good that he was actually able to match human performance.
A lot of people said to me: “You spent so many resources to just match human level?” But what I saw from that experiment was that automated machine learning was now possible. It was just a matter of scale. So if you scale more, you get a better result. We continued into the second project and we scaled even more and worked on ImageNet, and then the results started to become really promising.
Can you tell us about Jeff Dean’s involvement?
Well, he was very supportive. Actually I want to credit Jeff Dean for his help in the inception of the idea.
I had a lunch with Jeff in 2014 and he shared a very similar intuition. He suggested if you looked closely at what researchers were doing in deep learning at that time, they were spending a lot of time tuning architecture at hyperparameters and so on. We thought there must be a way to automate this process. Jeff likes scaling and automating the difficult stuff that most tech people don’t like to do. Jeff encouraged me and I finally decided to do it.
How different is neural architecture search from your previous research?
It’s different from what I did before in computer vision. The journey came from a thought and grew over time. I also had some wrong ideas. For example, I wanted to automate and rebuild the convolution, but that was the wrong intuition. Maybe I should have accepted the convolution and then used the convolution to build something else? It was a learning process for me, but it wasn’t too bad.
The Technology
What sort of components does a researcher or engineer need to build a neural network model?
It does vary a little bit amongst applications, so let’s narrow it down to computer vision first — and even within computer vision, there’s a lot of stuff going on. Typically in a convolutional network you have an input which is the image and then you have a convolutional layer and then a pooling layer and then batch normalization. And then there’s an activation function, and you decide to make a skip connection to a new layer and things like that.
Within the convolutional blocks, you have many additional decisions. For example in the convolution, you must decide the size of the filter: Is it 1×1? 3×3? 5×5? You also have to decide on pooling and batch norm. Regarding the skip connection, you can choose from layer one to layer ten or layer one to layer two. So there’s a lot of decisions to be made and a large number of total possible architectures. There could be a trillion possibilities, but humans now only look at a tiny fraction of what’s possible.
Your first paper about AutoML was Neural Architecture Search (NAS) with Reinforcement Learning, published in 2016. Since then your team has adopted evolution algorithms and began using progressive neural architecture search. Could you elaborate on these improvements?
In the original paper we started out with reinforcement learning because we had this intuition that it’s humanlike, you can use trial and error. But I’m curious so I said “ok, how about we try evolution?” We did a lot of experiments and got some success, and realized the process could be done using evolution, so we changed the core algorithm.
One of the bigger changes was the use of ENAS (Efficient Neural Architecture Search). When you generate a lot of architectures, each one is trained and evaluated independently compared to the previous generation. So you don’t normally share any prior knowledge or information. But let’s say you do develop a sharing mechanism and that you can inherit some weights from the previously trained network, then you should train faster. So we did that.
Basically the idea is you create a giant network that has all the possibilities in it, and then search for a path within the network (to maximize the expected reward on the validation set), which is the architecture that you are looking for. Some of the weights will be reused for the next experiment. So there’s a lot of weight sharing going on. Because of that, we actually speed up by many orders of magnitude. The original NAS algorithm is way more flexible, but it’s too expensive. This is basically a new and faster algorithm, but it’s also a little bit more restrictive.
The original NAS algorithm could generate better architectures as well as better hyperparameters, better data augmentation strategies, a better activation function, a better initiation and whatever. So far we have only managed to use the new ENAS algorithm for architectures, not for example for data augmentation and not as an optimizer.
Do you mean other components are generated by humans?
We identified architectures and data augmentation as two key areas that are very hard for human experts to design. So you get a lot of gains once you get those two things right. Most of the time you just use common optimization and standard practices. We just focus on automation of the components that provide the most benefits.
ENAS is still a recent development. There’s still a lot of black-box experimentation, and the research is moving rapidly.
I’ve heard one startup is now using a technique called generative synthesis. Also perhaps GANs? What the pros and cons of different search algorithms?
I’m not too sure about who is actually using GAN for architecture generation. I think it’s possible, although I’m less familiar with that.
Evolution and reinforcement learning are similarly general but again, if you don’t make any assumptions they can be quite slow. So people have developed the idea of progressive neural architecture search, where they start searching for a small component and then keep adding. That’s one idea that I think is very good.
Speaking of ENAS, basically the core idea is weight sharing. You develop a big architecture and then one path to it. ENAS is based on a number of other ideas like one-shot architecture search, namely, you build models and then figure out a way to share the weights between them. I think the pros of RL and evolution is that they are very flexible. They can be used to automate any component in the machine learning pipeline. But they’re also very expensive. Most specific algorithms like ENAS and progressive architecture search make some assumptions, so they’re less general and flexible, but they’re usually faster. I don’t know about GAN. I think people use GAN to generate better images, but I don’t think that they use GAN to generate better architectures.
What role does transfer learning play in AutoML technology?
There are two types of transfer learning. The first is architecture transfer learning, and by that I mean you find a good architecture on an image recognition dataset and transfer for example to object detection. Another type of transfer learning is weight transfer learning: You have a network and then you pre-train on your own dataset and then apply to your small dataset.
Let’s create the following scenario: Flower detection is what we want to do. Well, ImageNet has about one million images and the flower dataset is like 1k images or something. You can find the best architecture from ImageNet and then reuse the weights; or you can just take a state-of-the-art model like Inception V3 and train on ImageNet and do transfer learning on flowers and then reuse the weights. The common method is just to transfer the weights because most people don’t do architecture generation. You have to take your Inception V3 or ResNet and train on ImageNet. Once you’ve done the training, you do fine-tuning.
What I’m trying to argue is in reality you need both architecture transfer learning and weight transfer learning, which can be combined as follows:
- Combination One: You do architecture transfer learning first and then weight transfer learning.
- Combination Two: Architecture search directly on your dataset and do weight transfer learning on ImageNet.
- Combination Three: Basically you use ResNet and weight transfer learning. This is the state of the art.
- Combination Zero: Just architecture search without any transfer learning on your target dataset.
Each of the combinations varies amongst datasets because sometimes the dataset is larger and sometimes it’s smaller. Different combinations work differently on different sides of the dataset.
I predict that in the next few years Combination Zero, a purely architecture search, is going to produce better quality networks. We did a lot of research around this area and we know that it’s actually better.
An MIT and SJTU research paper has proposed a path-level network to network transformation.
That is a good idea. When I decided to work on architecture search I wanted to try that idea: You start with a good initial architecture and then you change and change, and try to get better and better all the time. But I thought that was a little bit unambitious, and I wanted something more ambitious!
The nice thing about writing papers is that when we publish, a lot of people have the same philosophy. And they make changes to the algorithms and then we actually learn from those research ideas and improve our research as well.
Which parts of AutoML still require human intervention?
We have to do a little bit of work on designing the search space. In architecture search, you have a search method which is evolution, reinforcement learning, or the efficient algorithm. But we also have to define a space where there are building blocks for convolutional networks or fully connected networks. There are some decisions to be made because right now AutoML has limited compute. We can’t just search for everything because the space is too large for us. For that reason, we have to design a smaller search space with all the possibilities.
Deep learning remains a black box technology. Can AutoML help users develop a better understanding of models?
We can develop some insights. For example, the search process will generate many architectures that look pretty similar. You can look at those architectures and then identify certain patterns. Or you can develop an intuition about what architecture would be best for your dataset. For example, on ImageNet, a typical layer in the network found by AutoML has multiple branches (unlike in more traditional networks, where each layer has a single branch or fewer branches). On the level of branches, it’s hard to explain what’s going on.
In ImageNet, the size of objects and images vary. Sometimes you have a very big object in the centre of the image and sometimes you have a very small object like a little part of the image. So you have different sized filters. By combining different branches, you get better results. We will continue to look into this.
AutoML’s Challenges and Future
What do you see as today’s biggest challenge in AutoML research?
In the next couple of years, I think the biggest challenge will be how to make the search more efficient because not many people want to use a hundred GPUs to solve problems on a small dataset. So figuring out how to make it even less expensive but without any tradeoff in terms of quality could be a very big question.
The second big challenge will be how to make the search space a little bit less manual. Because the search space right now has some prior knowledge in it. So even though we claim that we do all AutoML, certain elements of prior knowledge go into the search space. I think it’s less than ideal and I want to work on that.
But I can tell you that the quality of the AutoML beta has been great and the cloud people are very happy. I can’t go into the product details, but I think the quality has been great. And the reception has been also fantastic.
Do you see opportunities to improve AutoML’s robustness?
Generally when we do AutoML, we have a separate validation dataset. So we can keep validating on that dataset to evaluate the quality. Robustness is actually already part of the objective function of AutoML. Now in terms of additional constraints like trying to make models more robust to adversarial noise, or if you have other constraints outside that you want to fold into AutoML, it turns out AutoML has the strength to do this. This is a particularly great strength because a lot of the time when you have a new constraint it’s very hard to find how to insert it into models. It turns out that you can make a reward function, which could be a tradeoff between accuracy and robustness. Then it will finally evolve and you will find a model that has a good tradeoff between accuracy and robustness.
Let me give you an example. We had a researcher looking into how to design better networks to prevent adversarial examples. We did a small scale experiment on CIFAR10. He was able to find a network that is very robust against adversarial attacks, better than the state of the art. The result was good and the reason why it worked is that it’s very hard for a human to intuitively come up with a way to defend against attacks. But AutoML doesn’t care, it just tries a bunch of networks and then one network somehow internally has a mechanism to prevent attacks.
Is it possible to effectively compare the various AutoML solutions currently on the market?
You can do that. Whenever you have a task, you should create a separate dataset. You send it to the AutoML and it will could come back with some prediction models and then you get your prediction models to evaluate on your test set, which would be considered to be your golden set. The accuracy on that golden set is a good way to measure performance. I can’t comment very much on how we perform against other players in the market, but I think that’s available for people to take a look and compare.
Do you think AutoML can generate the next groundbreaking network architecture like Inception or ResNet?
I think it already has. We recently used architecture search to find better networks for mobile phones. This is a tough and difficult area and a lot of people are working on it. It is hard to beat MobileNet v2 which is now the industry standard. We generated a network that is significantly better, two percent better with the same speed on mobile phones.
And this is just the beginning. I think it will continue like this. In a couple of years, I predict that the best networks, at least in computer vision, will be generated rather than manually designed.
How do you feel about the hype around AutoML?
It is hard for me to comment on AutoML hype, but when I look at the number of people who want to use machine learning I see there’s a lot of room to make an impact by making machine learning more widely accessible. Certain techniques may be more hyped than others, but over time, I think there’s a very large area where we can make an impact.
Very few researchers have repeatedly made breakthroughs in machine learning. How do you maintain your creativity?
First of all, there’s a lot of amazing researchers who are extremely creative and doing great work so I wouldn’t say that I’m anything special. For myself, I have a number of problems that I am always curious to solve, and I really love to solve them. It’s a combination of curiosity and perseverance. I just want to follow my curiosity and make a positive impact in the world. I also play soccer on the weekend and I love gardening — I don’t know if that helps my research but it does help me to relax a little bit!
I have to ask: how do you deal with failure?
If you love something then you will just keep on going, right? So I really love machine learning. Teaching machines how to learn is a new way to do computer programming: instead of writing the program, you teach a machine to do it. I like that concept at the fundamental level. So even when there’s a failure I’m still having fun!
Journalist: Tony Peng | Editor: Michael Sarazen

Pingback: Data Science newsletter – November 3, 2018 | Sports.BradStenger.com