Andrew Ng is the Chief Scientist of Baidu Research in Silicon Valley. At the company’s three-year anniversary of API launch, Synced had a special interview with Ng to learn more about the progress of AI research at Baidu, why and how he became an AI expert, and more topics related to AI.
I. Artificial Intelligence Research at Baidu
On May 16th, 2014, Baidu officially announced that it will invest 300 million USD in the new research-and-development centre in Silicon Valley that will house up to 200 employees. The team is led by Andrew Ng, who is the most recent head of Stanford University’s artificial-intelligence lab. (read more on WSJ)
Before Ng joined, Baidu already made research progress deep learning. In 2013, Baidu founded the Institute of Deep Learning (IDL), and made a lot of progress in image recognition, image-based search, speech recognition, natural language processing, semantic intelligence, and machine translation research. The lab was directed by Kai Yu (who joined the company in 2012 and left in 2015), while Yanhong Li, the CEO of Baidu was its president.
The first thing Ng did after joining Baidu was to purchase a lot of graphics processing units (GPUs).
“He ordered 1,000 GPUs and got them within 24 hours,” Adam Gibson, co-founder of deep-learning startup Skymind, told VentureBeat. “At Google, it would have taken him weeks or months to get that.” (read more on Venturebeat)
Under the company’s strong support, Ng built a GPU cluster for deep learning, making Baidu the first company to have CPU clusters for deep learning. In recent years, Baidu continued to invest in GPU and supercomputers, pushing research in deep learning forward.
Ng also built Baidu Brain, after his founded Google Brain.
Figure 1. The Research Focuses of Baidu Brain
On Baidu Brain’s official site, we can clearly see Ng’s AI research focus, including machine learning, speech, images, natural language processing, and user image.
In September 2016, Ng introduced Baidu’s open source deep learning platform PaddlePaddle to the audience of Baidu Technology Innovation Conference. It is a cloud-hosted distributed deep learning platform that supports GPU computing, data parallel, and model parallel. It also supports sequential input, sparse input, large-scale data model training, and trains deep learning models with only a few lines of code.
Currently, PaddlePaddle is used internally by Baidu engineers for research and has already helped with the development of some launched products. The forerunner of PaddlePaddle is Paddle (Parallel Distributed Deep Learning), which is also a platform built by Baidu in 2013.
It is another open source deep learning platform after Google launched TensorFlow.
In the following month, Baidu also announced its open source benchmarking tool: DeepBench. DeepBench can evaluate the deep learning functionalities of hardware platforms, and help developers optimize deep learning hardware to further research progress.
Figure 2. Cooperative partners of Baidu’s Speech on its three-year anniversary
“There are several types of AI technologies behind Baidu Brain, among them, speech technology is the most developed one.” said Ng during the three-year anniversary of Baidu’s speech API launch.
Figure 3. The API launch of Baidu’s speech on its three-year anniversary
For a long time, having humans communicate with machines had always been a dream of the Human-Computer Interaction field. In recent years, with the widespread application of deep neural networks, computers’ ability to understand natural speech has drastically improved. There are many technologies involved, including speech recognition, speech synthesis, and voice input research, which Ng discussed during the event.
“Over the years, our team has optimized voice recognition systems continuously. We started with DNN model in 2012, had better features, and then started using Sequence Discriminative Training, as well as LSTM model, in addition to CTC. This year, we developed the Deep CNN model, it made a lot of progress.”
Baidu also launched Deep Speech 2 in November of 2015. It is an end-to-end deep learning system in speech recognition that had reached 97% accuracy rate, and is recognized by the MIT Technology Review to be one of 2016’s top 10 Breakthrough Technologies.
As companies compete to perfect speech recognition, Microsoft’s English recognition technology had an accuracy that can almost compete with humans. However, using computer to generate speech — a process that is usually called speech synthesis or Text-To-Speech (TTS) — still heavily depends on concatenative TTS, which requires a database that stores large amount of short speech fragments recorded by a single person. During the concatenative TTS process, the system will concatenate these pieces into complete sentences in order to generate the actual speech.
In September 2016, Google DeepMind had a breakthrough in speech recognition research —— WaveNet, which improved the performance of machine’s speech synthesis by 50%, in comparison to human performance.
“Our speech synthesis model is getting better. We have several technical breakthrough over the past few years, including the voice synthesis effect. Now, Baidu is an industry-leader in China’s voice synthesis industry.”said Ng. According to Baidu, their sentiment synthesis technology focuses on adding sentiments into synthesized speech. So far, the technology can produce voice effect that sounds very close to a real person. Early last year, the company even used this technology to reproduce the voice of the deceased celebrity Leslie Cheung.
In 2016, we also see the new progress made by deep learning in image processing(recognition accuracy), natural language processing, machine translation (Google’s neutral machine translation system) and so on.
For example, sequence-to-sequence models made more progress in natural language processing tasks. Ng stated that “for researchers, they may build new ways to make natural language processing system better, and some of their ideas may lead to breakthrough. For example, we have some research results in both word embedding and cross-model learning. It is very excited to learn computer vision and natural language processing at the same time.”
Ng thinks that transfer learning and multi-task learning are both really good research direction. He used Baidu’s NLP team’s research results in 2015 as an example, “if you study the translation of multiple languages at the same time, it is more effective than studying just one.”
Last year, Google’s neural machine translation caught industry attention. However, based on Synced’s interview with Baidu’s NLP team, we learnt that the company’s online translation system already used neural network one year prior to Google. Last year, Baidu published a paper named Multi-Task Learning for Multiple Language Translation at the ACL conference, discussing how to use NMT technology to solve multi-languages translations and the problems of not having enough speech sources. This is the multi-task learning mentioned by Ng above.
We also asked Ng about turning research into products.
“Chinese, American and companies from the rest of the world are developing AI technologies and deploying them into the market very quickly. Yet many people don’t know that a lot of these deployments actually happened first in China—— not all of them but quite a few. In the specific example of using neural networks to learn sequences for machine translation, Baidu actually figured out how to build and deploy it before Google. And there are more examples beside this.” Ng said.
“The Chinese tech industry is developing at an exciting speed. The astonishing fact is before us: a lot of the current technology is first deployed in China, and we only see it a year later in the U.S. However when people think about it, they usually get the timeline in reversal.”
This may be Ng’s best affirmation on China’s research efforts in AI.
In October 2016, the White House published a report named “National Artificial Intelligence Research and Development Strategic Plan”, which mentioned that China has made more progress than the U.S. in artificial intelligence research. China has already surpassed the United States in terms of the number of publications in deep learning and deep neural network research. Of course, this is doubted by many industry insiders, claiming that quantity alone does not contest to the quality of these papers.
Nonetheless, one of the recent surveys from Goldman Sachs states that the United States and China are likely to be major competitors in the arena of artificial intelligence development.
Ⅱ. Ng’s Road to AI
Born in 1976, Ng turned 40 in 2016. He is known to be one of the four big shots in deep learning, alongside Geoffrey Hinton, Yoshua Bengio and Yann LeCun. And yes, there are people who questioned his popularity.
VentureBeat gave the following explanation, “whereas Bengio made strides in training neural networks, LeCun developed convolutional neural networks, and Hinton popularized the restricted Boltzmann machines, Ng takes the best [of research developments], and makes improvements.”
Ng was born in London. His father was a doctor from Hong Kong, and Ng spent most of his childhood in Hong Kong and Singapore. When he was young, his father’s interest in applying artificial intelligence to the medical field influenced him very much.
He told us, “I started programming when I was 16. I was living in Singapore, and my father, a doctor, was interested in applying AI to health care. So I was fortunate to have some books on AI back then. I started learning about AI when I was very young—— actually when I was about 12. And then when I was 16, I was fortunate to have an internship at the National University of Singapore, where I got to do research on neural networks and even wrote a short research paper with professors. It wasn’t a very good research paper so I don’t recommend you to read it, but since then I have been fascinated by neural networks and their ability to learn from data.”
When Ng was 21, he received his bachelor of science in computer science from Carnegie Mellon University. After that, he received a master’s degree from MIT in 1998, and a doctoral degree from University of California at Berkeley. His supervisor was Michael I. Jordan.
After getting his doctoral degree, Ng started his career at Stanford University, where he later became an Associate Professor in Computer Science and Electrical Engineering, and also the director of Stanford’s Artificial Intelligence Lab.
In 2010, Ng joined Google’s X Lab as a consultant, and held onto his professorship at Stanford. He was one of the academic researchers who joined the industry.
From 2010 onward, artificial intelligence and deep learning became very popular, and more and more talents in academia joined big companies instead of conducting research in universities — for instance, Geoffrey Hinton, Russ Salakhutdinov, and Feifei Li. This phenomenon worried the academic world, since the lack of gifted researchers might influence how young talents are being educated.
However, Ng has a different perspective. He thinks that companies will also try to educate and produce talents, which might have bigger success than schools.
“Baidu’s recruitment department is investing a lot into education and training because we need talented people. That is why Baidu provides numerous courses for self-learning, such as deep learning, computer vision, natural language processing, and speech recognition. In fact, the Baidu Silicon Valley office has been called the place to learn artificial intelligence in Silicon Valley. So I think it’s very promising that companies rather than universities can help to train more AI talents for the future need.”
Initiating educational curricula are Ng’s other achievement.
In 2008, Ng launched the Stanford Engineering Everywhere (SEE) project by releasing Stanford courses to the public for free. He also taught a few classes in machine learning, which included video lectures and the study materials of course CS299 at Stanford. In August 2011, Coursera was founded, which became the biggest Massive Open Online Courses (MOOC) in the world.
In the same year, Ng, Jeff Dean and Greg Corrado founded Google Brain. It began as Ng first mentioned to Dean about his project at X Lab named Project Marvin. The two then used their spare time to develop Google Brain – they later asked neural scientist Greg Corrado to join.
During his time at Google Brain, Ng famously taught an artificial intelligence neural network to watch YouTube videos and identify videos related to cats. This marked a landmark new chapter in the field of AI.
Over the years, in his roles as PhD student, Stanford professor, and Chief Scientist at Baidu, Ng has led and guided the growth of many successful artificial intelligence teams. Based on these experiences, he recently wrote to several companies operating with data but lacking in deep learning knowledge, recommending that they hire their first Chief AI officers.
Original Article from Synced China http://www.jiqizhixin.com/article/2007 |Author: Yazhou Li | Localized by Synced Global Team: Jiaxin Su, Meghan Han, Rita Chen