Microsoft researchers in the US and Asia sent a shockwave through the AI community today with their paper Achieving Human Parity on Automatic Chinese to English News Translation, which introduces a neural machine translation system they say equals the performance of human experts in Chinese-to-English translation.
Although artificial intelligence has outperformed humans in tasks such as image accuracy and speech recognition, many experts doubted machines could do so with language translation. “Hitting human parity in a machine translation task is a dream that all of us have had,” said Xuedong Huang, a technical fellow in charge of Microsoft’s speech, natural language and machine translation efforts. “We just didn’t realize we’d be able to hit it so soon.”
Microsoft’s system was tested on the benchmark news story dataset newstest2017, which was developed by a group of industry and academic partners and released at last fall’s WMT17 research conference. To measure the translation quality accurately, Microsoft researchers hired bilingual human evaluators to compare Microsoft’s results with two independently produced human reference translations, instead of referring to traditional metrics such as BLEU and TER.
“The same source sentence can be translated in sometimes substantially different but equally correct ways. This makes reference-based evaluation nearly useless in determining the quality of human translations or near-human-quality machine translations,” says the paper.
Microsoft’s new machine translation system scored 69.0, indistinguishable from human translation which scored 68.6, according to the paper.
Huang told Synced that machine translation is the key to mastering natural language understanding (NLU), which researchers believe will facilitate the development of artificial general intelligence (AGI) — the long-range, human-intelligence-level target of contemporary AI technology.
“NLU does not have large datasets. However, machine translation does. We use the deep neural network to learn semantic representations, which can be applied to NLU. As we learn the expression of language, we may have a chance to solve NLU and improve Cognitive Services (a set of Microsoft’s machine learning algorithms),” says Huang.
Microsoft researchers focused on the Chinese (Mandarin) to English language pair as these are the two most used languages in the world, and sampled texts from the news domain because news stories have a wide content variety. Microsoft researchers caution that their results will not necessarily generalize to other language pairs or domains, even though the techniques used were not specific to languages or domains.
Huang attributes the breakthrough to three factors: increased computation capability provided by Nvidia’s GPUs; improved algorithms and a particularly deep neural network; and an optimized dataset, using engineering methods to wipe out low-quality data, or noise.
To improve the model’s accuracy and fluency researchers used additional training methods, for example, a dual learning technique that learns from both source-to-target and target-to-source translation data by taking a sentence translated from Chinese to English and translating it back to Chinese, then comparing the result to the original sentence.
Another technique employed was deliberation networks, which train the model to repeatedly translate the same text. Similar to how a human might write multiple drafts, the deep neural network gradually improves and refines its output.
This new system has not yet been applied to Microsoft’s commercial translation products such as Microsoft Translator, PowerPoint Presentation Translator, or Cognitive Services, but Huang says his team is working on it.
Researchers still face many challenges in machine translation, particularly with real-time translation and speech-to-speech translation. Microsoft’s milestone positions the company among the global leaders in this busy research field.
Journalist: Tony Peng| Editor: Michael Sarazen