China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’

Synced

3 years ago

Since the May 2020 release of OpenAI’s GPT-3, AI researchers have embraced super-large-scale pretraining models. Packing an epoch-making 175 billion parameters, GPT-3 has achieved excellent performance across multiple natural language processing (NLP) tasks. Despite their size and power however, such models still lack common sense or cognitive abilities, and so struggle with complex reasoning tasks like open dialogue, knowledge-based Q&A, visual reasoning, etc.

In a bid to promote the research and development of China’s own large-scale pretraining models and further explore universal intelligence from a more fundamental perspective, the Beijing Academy of Artificial Intelligence (BAAI) recently unveiled Wu Dao 1.0, China’s first homegrown super-scale intelligent model system.

The work was led by BAAI Research Academic Vice President and Tsinghua University Professor Tang Jie, with contributions from a team of more than 100 AI scientists from Peking University, Tsinghua University, Renmin University of China, Chinese Academy of Sciences and other institutes.

Wu Dao 1.0 has initiated large-scale research projects via four related models: Wu Dao – Wen Yuan, Wu Dao – Wen Lan, Wu Dao – Wen Hui, and Wu Dao – Wen Su.

Wu Dao – Wen Yuan is China’s largest-ever pretraining language model, boasting the best processing power in mainstream languages, including Chinese and English. It has surpassed average human performance benchmarks on text categorization, sentiment analysis, natural language inference, reading comprehension and more. The Wu Dao – Wen Yuan project is designed to explore universal natural language understanding (NLU) techniques and study brain-inspired language models. It has 2.6 billion parameters and is capable of performing cognitive activities such as memorization, comprehension, retrieval, numerical calculation, multi-language, etc. Wu Dao – Wen Yuan has achieved GPT-3 comparable performance on 20 Chinese NLP tasks such as open-domain answering, grammar correction, sentiment analysis, etc.

Wu Dao – Wen Lan, meanwhile, is the first publicly available Chinese universal graphic multimodal pretraining model. The ultra-large-scale multimodal pretraining model aims to break through the theoretical challenges of pretraining multimodal data based on a combination of graphics, text and video, and eventually generate industrial-grade Chinese graphics pretraining models and applications that exceed SOTA performance. Currently, the model has 1 billion parameters and is trained on 50 million graphic pairs collected from open sources. The Wu Dao – Wen Lan model has reached SOTA performance, scoring 5 percent higher than the champion team on the Image Caption task on the Chinese public multimodal test set AIC-ICC and 20 percent higher than the most popular UNITER model on the Visual Entailment task.

Wu Dao – Wen Hui is an ultra-large-scale cognitive-oriented pretraining model that focuses on a series of essential problems in general artificial intelligence from a cognitive perspective, aiming to develop and enhance the logic-, consciousness- and reasoning-based cognitive capabilities of pretraining models. Wu Dao – Wen Hui has reached 11.3 billion parameters, and through simple fine-tuning can generate poetry, make videos, draw pictures, retrieve text, perform complex reasoning, etc. BAAI says the model achieves near-human performance on poetry generation on the Turing test.

Wu Dao – Wen Su is a large-scale training model for biomolecular structure prediction. It can handle super long biomolecular structures, where it has achieved SOTA performance, interpretability and robustness. Based on Google’s BERT language model, Wu Dao – Wen Su has completed protein training on the 100 GB UNIPARC database and gene training on 5-100,000 human peripheral blood immune cells (25-30 cell types) and 10,000 drug-resistant bacteria.

The BAAI research team summarizes some of Wu Dao 1.0’s key contributions:

Wu Dao – Wen Yuan introduces the open-source Chinese pretraining model (CPM). Based on CPM, the CPM-Distill model reduces language confusion by 38 percent and achieves better results on downstream tasks.
Wu Dao – Wen Lan is the first Chinese generic multimodal pretraining model that can understand “connotative information” based on weak correlations of images and text. Wen Lan uses an advanced cross-modal contrast learning algorithm: Given an image-text pair, it can enlarge the number of negative samples for each modal, especially for those which are difficult to distinguish, further improving the expression ability of neural networks. It can easily replace image and text encoders with the most advanced single-mode pretraining model, achieving 20 times faster performance than the UNITER model.
Wu Dao · Wen Hui proposes a new pretraining paradigm, Generative Language Model (GLM), breaking the bottlenecks of BERT and GPT. For the first time in history, a single model has achieved the best results in language understanding and generating tasks, and surpassed common pretraining models such as BERT, RoBERTa and T5 that trained on the same volume of data. Wen Hui’s continuous vector based fine-tuning method, P-tuning, is the first autoregressive model that surpasses the AutoEncoder model in NLU tasks and has achieved SOTA results on more than 10 tasks such as Knowledge Extraction and Superglue Fewshot Learning, with over 20 percent performance improvement. Wen Hui’s inverse prompting algorithm achieves close to human performance on the task of Q&A and poetry generation, and is the first model that can generate classical Chinese poetry based on modern themes.
Wu Dao – Wen Su’s open-sourced FastMoE is the first high-performance MoE (Mixed Expert Model) system that supports the PyTorch framework and a variety of hardware. Only one line of code is required to complete the MoE transformation, and model training speed is increased by 47 times compared with the traditional PyTorch implementation.

BAAI Research is currently in discussions with Sogou, 360, Alibaba, Zhipu.AI, Xinhua News Agency and others on model applications. The team also plans to build API interfaces to support high-concurrency and high-speed reasoning for enterprise and individual users.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Share this: