On March 29, The “Joint Laboratory of Language Intelligence and Human Machine Interaction” opened its doors in Beijing. This lab is the result of a collaboration between Chinese AI company Mobvoi and the NLP and Machine Translation research teams from the National Laboratory of Pattern Recognition, at the Chinese Academy of Sciences’ Institute of Automation (CASIA). It will focus on core technologies in the human machine interaction field, such as natural language understanding, multi-turn dialog management, question-answering system, and machine translation.
Mobvoi was founded on October 2012, and is a industry leading AI research company in the fields of voice recognition, semantic analysis, and vertical discovery. According to Reuters, after its Series C funding on November 2015, Mobvoi has received a total of 75 million dollars in venture capital from Google, Sequoia Capital, Zhen Fund, SIG Asia Investment, Perfect Optronics, and GoerTek.
“We can’t only do research when the need arises, or be satisfied with imitating new technologies and using open source algorithms from other countries. Only when we invest in discovering leading edge technologies and core algorithms can we achieve breakthroughs in AI.” says Mobvoi founder Zhifei Li. After obtaining his PhD from John Hopkins University in the US, Zhifei Li joined Google Research, and worked on machine translation related research.
Due to this ideology, Mobvoi places no restrictions on investing in collaborative research. ”As long as there exists a possibility for meaningful results, even if it doesn’t directly affect our technology; it doesn’t matter. We are in a sense, idealists, but we truly wish for breakthroughs” Zhifei Li said.
In fact, what brought the two sides together for this 3-year collaboration is their shared optimism for the future NLP and human machine interaction, as well as the complementary effect of their research resources. CASIA’s National Laboratory of Pattern Recognition’s research focus is on machine translation. With machine translation being the core application of NLP technologies, the team there comes with very solid research results in NLP as well. On the other front, Mobvoi’s ambition in NLP was clear since day one. Two years ago, Mobvoi started efforts in commercializing AI powered smart devices. With the release of its Ticwatch smartwatch and Ticmirror smart rearview mirror for cars, Mobvoi was able to amass a large customer base, data tailored toward its needs, and a system that can enable end-to-end development.
To Zhifei Li, there is a large difference between corporate R&D and academic R&D. Seldom would corporate R&D focus on the long term. Hence, startups must maintain the sensitivity for leading edge technologies, and seek collaborations with academic research labs to ensure its long term technological advantage. He used the rise of deep learning as an example. “The ‘Father of Deep Learning’ Geoffrey Hinton did related research for many years in academia. In 2007, while I was interning at Microsoft Research’s voice recognition group, he was already collaborating with Microsoft, discussing incorporating deep learning algorithms in Microsoft’s systems. It was only in 2012, when they had huge breakthroughs in voice recognition, did Google join in. In the US, most of the technological discoveries happen in academia. This is a classical model in the West.”
The plan for this collaboration is to help the companies involve in expanding their product and service offerings. Starting with building NLP systems for targeted applications, eventually extending to innovations in situation construction and algorithm design, with the final goal of designing algorithms and systems with the ability to evolve and scale. Currently, the bottleneck for NLP based human-machine conversation systems is context understanding. For example, in a music player use case, a user may say “I want to listen to a Jay Chou song.”, “Does he have rock songs?”. Or, in a smart car use case, a user may say “I need navigation to Guo Mao”, “find the nearest parking lot.” In both use cases, there exists a context understanding problem.
“Our longstanding goal is to apply voice interaction in 3 major use cases: wearables, smart home, and smart car”, Zhifei Li said. “The future trend is screen-less and handless voice interaction. In other words, the application of voice interaction in situations where the efficiency for hands and eyes are very low. On a technical level, the future trend is also about incorporating deep knowledge about the physical world with voice interactions. Just like in human interaction, human-machine interaction systems can only achieve rapid, meaningful, and effective communication when it has the knowledge to do so. This process requires solving problems in knowledge representation and logic analysis etc. The ultimate objective at the core of our company is to develop an ever-present virtual assistant that understand everything you say.”
Outside of research goals, Zhifei Li also wish to create an effective model for industry-academia collaboration. In the past, such collaborations usually result in the benefit of only one party, such as companies supporting academic research, but the research results aren’t novel enough to grow the product and service offerings of the company. Or when academia is only providing human resource to the company, and doesn’t receive anything substantial in return. Only by creating a model that can solve real, meaningful problems for the academic institute, while helping the company apply research results to its business, can these types of collaborations become mutually beneficial and effective, achieving the dual goal of technological innovation and industry development.
“This is not a short-term fast process. It takes a long time and patience to achieve expected results”, Zhifei Li said. “In fact, many academic researches know very little about industry product systems. We hope that in this collaboration, they can take some time to learn our system. This way, they won’t need to start from scratch. Instead, they can directly perform algorithm verification using the system we spent so much resources to build. I think this is the interesting part in research. Solving a unique problem in a unique system is a lot more fun than solving a problem from years ago. It will also be much better received than increasing the accuracy of some algorithm by 5%. Our human machine conversation system is based on NLP, and it’s intended to be directly used by the user in a mobile environment. It’s a very new system. Trying to find new directions here is akin to creating a new city, and growing plants within that city, instead of trying to develop better seeds in an old city. Not only does this type of research more likely to yield results, it will also lead to better industry recognition.”
Original Article from Synced China http://www.jiqizhixin.com/article/2567 | Author: Jingyi Gao|Localized by Synced Global Team: Xiang Chen