University of Waterloo has the largest math department in the world and is renowned for its computer science department. On August 18th, 2011, a computer science graduate student named Kaheer Suleman founded a company called Maluuba, using an intelligent program he invented as the product.
Kaheer Suleman, CTO of Maluuba
In February 2012, Maluuba secured $2 million seed funding from Samsung Ventures. Within the span of 6 months, it built an Android personal assistant application that rivalled Siri and Google Now. The product was enlisted as a finalist on Techcrunch’s Disrupt list, described by some as “the Siri on Android platform”. Yet Maluubua did not stop at challenging Apple; it went on to surpass the tech giant in voice search.
Maluuba ran its speech assistant through third-party service providers. In November 2012, the company released its natural language processing API, allowing mobile developers to add speech processing features to individual applications. In December 2012, Maluuba launched an online shopping feature via voice command, which is way ahead of Siri.
In 2013, more consumer electronics companies added new features to their products, Maluuba also started to work on smartphones, televisions, and self-driving cars. For example, LG’s G-series phone used Maluuba’s technology in their Voice Mate app. In February 2013, Maluuba announced that they officially shifted to Windows Phone platforms. The Windows Phone 8 had most of the Android voice command features, from finding restaurants, movie theatres, news, and stores (including speech shopping), setting up alarms and notifications, scheduling meetings, making calls, sending texts and Emails, navigating maps, searching the weather, to adding things to the Outlook calendar.
Research Scientists at Maluuba: Adam Trischler (left) and CTO Kaheer Suleman (middle)
Maluuba’s goal is to make machines with human-level understanding. One of artificial intelligence’s challenge is the lack of big datasets and the difficulty of simulation. There are millions of web pages on the internet, however, it is hard to convert all these texts into a kind of language understood by machines. Therefore, teaching machines to read is a huge milestone. Maluuba knew that relevant research such as deep reinforcement learning needed time to develop.
Astounding research results came out in 2014. One example is that an artificial intelligence from DeepMind was able to use deep learning technologies to play electronic games under unsupervised conditions.
In August 2015, Maluuba secured an Series A Funding of 9 million Canadian dollar for deep learning research. The company soon opened up a lab in Montreal, Canada, which housed 13 researchers and lab supervisor Kaheer Suleman. The lab focused on two branches of machine learning: dialogue and machine reading. The team was more interested in developing intelligence that solves real-world problems. They partnered up with AI expert Yoshua Bengio from University of Montreal, and Richard Sutton from University of Alberta who specialized in reinforcement learning.
Yoshua Bengio, artificial intelligence expert from University of Montreal
Richard Sutton, reinforcement learning expert and University of Alberta professor
Today, there are more than 50 millions mobile devices (smartphones, self-driving cars, and etc.) that use Maluuba’s natural language processing services.
Maluuba also became headliners. In March 2016, the team published a paper on ArXiv showcasing their latest research named A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data. The lab had trained an algorithm that could read hundreds of fairy tales, and correctly answer questions based on unfamiliar stories. Researchers used Harry Potter and the Philosopher’s Stone to test the algorithm: it answered with high accuracy — 15% better than without deep learning, and 2% better than manual coding solutions.
“Based on statistics, this is a huge improvement.” Joshua Bengio claimed.
Maluuba believed that using parameters that measure spatiotemporal location of characters and plot developments helped them to understand virtual situations better, thus building better personal assistants. This is similar to Facebook’s approach, where they simplified the story of The Lord of the Rings to train their AI (Maluuba emphasized that they used longer and original versions).
In April, Maluuba released a demo on YouTube, showing AI robot Mercy reading and understanding the summary of The Game of Thrones, just like how a real human would understand the show just by reading wikipedia.
A Maluuba engineer then asked the artificial intelligence program, “Who attacked Jon Snow?”
“The night keeper,” the program responded quickly and accurately.
The answer wasn’t even released on the internet when Mercy answered.
If you ask Siri the same question, it either doesn’t understand what you are talking about, or jump directly to a search engine page.
The video demo showed that Maluuba’s AI is capable of processing huge amount of text data, and answer compalicated open-ended questions. This is a big breakthrough in the fields of machine learning and artificial intelligence. According to Mahamed Musbah, the Vice-President of product at Maluuba, ” people will see very interesting things [from us] in the coming months.”
In June 2016, Maluuba was under the spotlight again.
The company published a paper on machine reading (Natural language Comprehension with the EpiReader), announcing their new product named EpiReader, which is proclaimed to be the best machine reading comprehension system in the world. If people remove a few words from a piece of text, this system can fill the blank according to the context. For this project, Maluuba used plenty of news articles from CNN and Daily Mail as their training data (Google’s DeepMind also used articles from these two media outlets to build training models, while Facebook used children’s books). Currently, EpiReader’s accuracy is 74%, higher than both DeepMind and Facebook.
On June 30th, 2016, Maluuba published another paper named A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems.
Abstract: User simulation is essential for generating enough data to train a statistical spoken dialogue system. Previous models for user simulation suffer from several drawbacks, such as the inability to take dialogue history into account, the need of rigid structure to ensure coherent user behaviour, heavy dependence on a specific domain, the inability to output several user intentions during one dialogue turn, or the requirement of a summarized action space for tractability. This paper introduces a data-driven user simulator based on an encoder-decoder recurrent neural network. The model takes as input a sequence of dialogue contexts and outputs a sequence of dialogue acts corresponding to user intentions. The dialogue contexts include information about the machine acts and the status of the user goal. We show on the Dialogue State Tracking Challenge 2 (DSTC2) dataset that the sequence-to-sequence model outperforms an agenda-based simulator and an n-gram simulator, according to F-score. Furthermore, we show how this model can be used on the original action space and thereby models user behaviour with finer granularity.
As of today, Maluuba is a leader in the field of artificial intelligence.
It is also the acclaimed fastest-growing startup in the world.
Interview Pt. 1 Technology & Research
Adam Trischler — Research Scientist at Maluuba, Lead of the Machine Reading Comprehension (MRC) Team
Eric Yuan — Research Engineer at Maluuba, Member of the MRC Research (left)
Synced: Can you explain the EpiReader framework to our readers? Especially the two neural networks.
Adam Trischler: The EpiReader uses two stages of processing to determine the answer to a question.
The first stage uses a bidirectional GRU to read the story and question word-by-word, then uses an attention mechanism as in a Pointer Network to select words in the story that are the most likely answer candidates.
In the second stage, these candidate answers are inserted into fill-in-the-blank questions to form “hypotheses”, then a convolutional neural network compares each hypothesis to each sentence of the story, looking for textual entailment (TE). Entailment basically denotes that the two statements mean the same thing. The hypothesis that has the most similar meaning to the story gets the highest entailment score. By the end, entailment scores are combined with initial probability scores from stage one to rank all candidate answers.
Synced : EpiReader has made a lot of progress with unstructured data. Does it also accelerate the progress of making machines capable of understanding knowledge and conducting reason? Are we getting closer to automated learning knowledge?
Adam Trischler : This is still an area that requires a lot of research and development. Yet it is also very exciting, because moving forward in this area would get us closer to true intelligence. It is a multi-faceted problem with different areas that require traction, and we are certainly optimistic in terms of how we are moving forward. But I think we should be careful in terms of getting too excited now that we are here. We’ve done a lot over the last 12 months, we will have to continue doing this to the industry over the next 3 years. Our goal is to give feedbacks to researches and see what the challenges will be, and how to grant solutions to problems. By all means, how quickly can you move forward? The challenge we face is more fundamental, and it requires a lot of unsupervised study work that does not require human intervention to learn from the information.
Synced: Your paper didn’t discuss about error analysis. Can you explain to our readers how error analysis is done in this project?
Adam Trischler：A recent paper from Stanford named A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task performs an analysis on the CNN dataset based on question types and their difficulty. We are in the process of testing our models on these question subsets to gain a better understanding of error modes.
Synced: In the reasoner section, when you combined hypothesis embedding with sentence embedding, you simply connected similarity and the two embeddings to the GRU. Have you considered putting this combination into a deeper structure? For example, the two embedded into a multi-layered DNN/RNN.
Adam Trischler: We are currently considering a more interesting and deeper approach to comparing hypothetical and sentence embeddings.
Synced: EpiReader has shown better performance than the solutions developed by Google’s DeepMind, Facebook’s AI and etc. What does this imply for Maluuba?
Adam Trischler : This implies that Maluuba is achieving a measurable amount of success in its mission to become the most advanced NLP-focused AI lab in the world. Our results demonstrate that we are important contributors to this field, and that we are capable of devising creative and effective solutions to some of the big problems in NLP. We’re also excited to be pushing the limits in NLP research; our progress is a testament of the role startups have to play in bringing artificial intelligence closer to life. While we’ve seen tremendous success in other areas of A.I. such as speech recognition and image retrieval, there is still a long path forward before machines comprehend (and actually “read”) text at the level that humans do. We will continue to push for breakthroughs in language understanding, and in the process, gradually launch the technology into applicable markets.
Synced: How do you ensure the quality of training data? Besides datasets from academia, will you pay for trainings from the industry?
Adam Trischler: I can’t speak to the mechanism of how we do it and there are different ways of building your own datasets, but this is certainly an important area. A lot of it depend on the work that already exists, and that is why we picked the CNN and CBT datasets —— they are probably the most —— I’ll say the most recent, and the most recognized data in this area. But moving forward, you’re absolutely correct in that, we will need to become a pioneer in developing and sharing datasets with the world, and this is also where we collaborate with different institutions.
Synced: In NLP, the vast majority of frameworks are using Seq2Seq model plus the Attention mechanism. This framework has some of its own problems, are there any ways to improve these problems?
Adam Trischler：Sequence-to-sequence frameworks have become ubiquitous in NLP tasks, particularly in machine translation where they are proved to be very effective. Yet based on our experience using it, Seq2Seq is not well suited to the question answering task designed for EpiReader. We need higher level reasoning abilities in this task that the architecture of Seq2Seq model does not have. Furthermore, the Seq2Seq model with Attention blends individual word representations together in its hidden state, which seems to have caused problems with answer determination, since the distinct meanings of words are blurred out. Seq2Seq with Attention is best suited to tasks where the elements of the input sequence can be closely aligned with those of the output sequence. This is the case in translation but not necessarily in question answering. In another recent paper, Maluuba designed an Iterative Alternating Attention (IAA) mechanism that performed very well on the question answering task. This mechanism reads the question and the story multiple times, focuses on different parts, and builds an information state over time. In this way, the IAA mechanism simulates a chained reasoning process. Read our paper Iterative Alternating Neural Attention for Machine Reading for more.
Synced: The traditional NPL method can’t really solve cognitive computing problems. In order to develop smart chat robots that can talk naturally to humans, do we need to develop other methods?
Adam Trischler：As far as conversational systems are concerned, I don’t think this is NLP’s fault as a system. Having basic intelligence (such as the ability to reason, comprehend, analyze, and etc) involves a lot of things beyond the scope of natural language processing. We are heading toward that direction now. For example, EpiReader focused on machine understanding, and the idea behind it is information reasoning. What we want to do in the future is to let chat robots understand real human emotions and answer everything that happens to humans before the problem. They also need perceptive abilities, such as vision, hearing, and etc. In a combined effort, we are trying to reach the Holy Grail of Artificial Intelligence. There is not much progress in the past 12 month, that is why we decided to double the bets here.
Interview-Pt. 2 Technology & Research
Mohamed Musbah— Vice President, Product at Maluuba
Synced: Are you currently using the aforementioned technologies to develop products? What is your area of interest?
Mohamed Musbah: We are very interested in industry application. Especially with customer service, Because it involves a lot of interaction with the real scenarios, and we need to take user experience into consideration. In a real-world scenario, when users report an error, they want their problems to be solved as quickly as possible. The question now is how we can automate and stabilize the machine and address the user’s problems. Our latest paper is on machine comprehension, in which we provide data to an artificial intelligence model to test whether the model understands the semantics. Research is still in progress, and more details will be publish in a few months. We are the most interested in conversational agent and dialogue systems.
Synced: With the current technology, is Maluuba going to develop a service platform with API or develop your own product?
Mohamed Musbah：We are also very interested in products, and our Research Lab has done a lot of works on it, however it requires a lot of work to finish viable products. For the industry, this is a relatively new concept. But we’re also working on other applications, such as conversational systems – and committed to develop systems that allow us to talk to our users. We are also expanding the existing API by increasing its functionality, which is the future direction of our efforts, and ultimately we want to achieve is that I mentioned before , Machine understanding. For example, think about reading the car manuals. It often has 200 pages, and no one finishes reading it. We would like to put the contents into neural networks, which will be able to help you understand it. This involves natural language processing and is the key to Machine Understanding. In other words, we provide the user interface, users provide the content, and ultimately we will provide the analysis results. That is the API we are working on. In addition, we provide offline models to address situations when the network is unavailable.
Synced: In comparison to IBM’s medical books readers, what does Maluuba want to build by asking the AI to read the driver’s manual?
Mohamed Musbah：I can’t speak on behalf of a certain company for what their goals are, but I can definitely talk about the general trend of the industry and how we’re still different. One thing we’ve seen is that the studies of machine learning, even NPL, are applied to very narrowed fields.
But we don’t think the future will be limited this way. If we see AI research moving into a certain area, it cannot be applied to anywhere else! They don’t have the capacity to do one subject and apply that knowledge to another subject. What we’re trying to do is take a different approach here: we don’t teach the machine to do work in a narrow domain, instead, we teach them to understand people. The most important step is to endow machines with cognition, which is what people have. For example, when I read an article, I can understand it regardless of it being a fiction, magazine, or medical journals.
If we want machines mimic humans, we need to give them reasoning capabilities. The EpiReader article is a part of the new research into NPL, and we used CBT and CNN datasets to test its accuracy. Can we teach them to understand? Can we teach the machine to read a medical journal or Harry Potter book like a ten-year old? It doesn’t have to delve into any specialty —— a ten year old can’t understand medical jargons, but it can understand the basics.
Sycned: Does this mean your product will not be bound to a specific domain? Regardless of it being an IT or medical assistant?
Mohamed Musbah：Of course. As a company, we cannot get involved in every vertical field, but we can build a very generalized system to suit different needs. Everything will rely on the foundation of understanding human language, and set further challenges to teach the machine reasoning, memory, understanding, and the ability to have a conversation.
Sycned: Can you share with our reader how Maluuba started and how you get to today?
Mohamed Musbah：During 2010-2011, several of our computer science students set up the artificial intelligence lab at the University of Waterloo. Our initial goal was simple: create machines that communicate with humans. It was a crazy idea at that time, given that now artificial intelligence and natural language processing have developed a lot in the past five years. Now, we are glad to be able to work with many companies, and support them using our products and technology.
Synced: How big is the Maluuba team now ?
Mohamed Musbah: I don’t have a specific number. One of our goals is to solve the fundamentals of language processing, and I need a good team for that. Fortunately, our Montreal research centre is working on this, and making a lot of progress.
We also want to build the deep thinking natural language lab in the world. This is a very special field. We want to meet more talents. We are proud of the team we have now and are looking forward to expand it. I think we are on the right track.
Synced: “Our vision is a world where intelligent machines work hand-in-hand with humans to advance the collective intelligence of the human species.” Maluuba’s vision sure sounds very exciting, can you help our reader to visualize it better? Do you have concrete examples?
Mohamed Musbah：We are very optimistic about the future of artificial intelligence. In the past few decades, artificial intelligence has been mainly used to perform operational tasks. The more intelligent the machine will be, the more reliable it . Self-driving cars are a good example. We used to think that it is very remote technology, and we are almost there now.
Aside from this, robotics are advancing at an exciting pace. Robots can execute many tasks, and liberate us from mundane tasks. For example, in order to drink a cup of coffee, we need to operate the complex coffee machine. But what if we make the coffee machine intelligent? It will learn you preferences, help you prepare, and simplify everything.
Synced：Maluuba has three excellent advisors, who all very well-recognized in this domain. How do you initiate the cooperation with each one of them, let’s say, Professor Bengio? How can advisors help the company? How do you evaluate their research work?
Mohamed Musbah：I am very glad to have these excellent consultants, Dr. Bengio is very famous in the field of artificial intelligence. We have all reached the consensus that natural language processing (NLP) is a critical domain. Artificial intelligence can drive a car now, but in understanding human language and semantic reasoning, it is still very limited. Dr. Bengio and I have been discussing a lot about how we can advance research in this field. Dr. Bengio is now a member of our team. Our research efforts are very alike.
Synced: What are the biggest challenges that confront your team? Are they technical?
Mohamed Musbah：We have done a lot of research over the last 8 to 10 months, and we are fortunate that, with EpiReader as an example, we were able to beat the state-of-the-art results from companies like Google, Facebook and IBM. Now this is a really nascent area, which means that not every path you take will produce results. We are fortunate in terms of where we are and we have a lot of things in the pipeline right now that we are gonna announce in the next 6 months, yet this is still a completely uncharted territory both for Maluuba and for everybody else in this space. That means we are gonna solve challenges that we’ve never seen before. Technically speaking, the idea of teaching machines to reason was crazy in 2010, was crazy in 2012, it is still crazy in 2016, but it is not as crazy. To get to that point is gonna be incredibly difficult, but even we got 20 or 30 percent the way there, we’ve done a lot. And that is what we are starting to do. It is very open-ended. It is still really really early, but we are very excited with the progress we’ve made so far and we believe there is still a lot more to be done.
Synced: How do you plan to compete with research teams from Facebook or Google that are very resourceful?
Mohamed Musbah：That is a good question. It’s challenging for any startup to compete with resourceful, large organizations like Google, or whoever may call any of our researchers and say “i’d like to offer you one million dollars to leave Maluuba to come to our team”(laugh). That’s the reality and something we have to deal with. The first answer we have is: passion. When we bring in the team we explain graphically what we gonna do and how we’re gonna try solve these problems. If you look at the larger companies, from the research point of it, it’s obvious that we’re in different areas. It’s not just the difference between academic and industry. You can leverage a lot of expertise and bring leverage into the lab.
The other aspect is the team, and our combined vision: every single researcher plays a critical role in reaching it, and what we can offer them is to join the company. A lot of that comes down to a fundamental level of what the researcher wants to do: a researcher that wants to be paid a million dollars, handle a crazy amount of work may not necessarily fit in with Malluba. If the researcher wants to solve fundamental problems, they are aligned with our objectives. We’re going to prove to them that it’s worth it, and we think we can manage this in a startup company, then we’re gonna keep growing.
Maluuba’s headquarter in Waterloo, Canada
Synced: Will you remain as a Canadian start-up company or do you plan to expand labs to the rest of world ? What’s your plan?
Mohamed Musbah：We certainly want to do that. The world is a globe. We’re having this conversation when it’s 10 in the morning for me and 9 at night for you, and I appreciate you taking the time to have conversation with me. From a research point of view, we would also love to do that. This is how it should work. Canada is where we are today, but that doesn’t mean we’re not traveling all over the world to attend conferences, or that we’re not meeting smart people, or looking forward to collaborate with them. What we see happening in China is incredible. Maluuba is international and we’ve been working with global companies. Even in China there are incredible companies doing amazing work that are strong in their areas. For Maluuba, we haven’t explored the option to expand labs, but it is a goal.
Synced : Do you have any recommendation to other AI entrepreneurs and researchers ?
Mohamed Musbah: I would like to say that at this point, it’s exciting because there are a lot of exciting things to do. We are at the forefront of addressing these issues. The industry is very happy to support entrepreneurs to solve these problems, whether it is financial or strategic capacity. However, I would like to remind you some things. First, make sure you can distinguish between fact and fiction. A lot of information on artificial intelligence are too exaggerated, either due to the lack of basic understanding, or being too excited. Distinguishing fact from fiction can help you really understand current situations, to help you identify the problems accurately. Second, try to solve the issues no ones have solved before. In the few years later, I truly believe that the industry will filter people making brand new products and solving novel problems from the others.
Original article from Synced China www.jiqizhixin.com | Localized by Synced Global Team: Meghan Han, Rita Chen, Jiaxin Su