Tag: Multimodal Learning

AI Computer Vision & Graphics Machine Learning & Data Science Research

Meta AI’s OMNIVORE: A Modality-Agnostic Single Vision Model With Cross-Modal Generalization

A Meta AI research team presents OMNIVORE, a single vision model for various visual modalities that can perform cross-modal generalization and achieves performance at par or better than traditional modality-specific models of the same size.

AI Machine Learning & Data Science Research

Baidu’s 10-Billion Scale ERNIE-ViLG Unified Generative Pretraining Framework Achieves SOTA Performance on Bidirectional Vision-Language Generation Tasks

Baidu researchers propose ERNIE-ViLG, a 10-billion parameter scale pretraining framework for bidirectional text-image generation. Pretrained on 145 million (Chinese) image-text pairs, ERNIE-ViLG achieves state-of-the-art performance on both text-to-image and image-to-text generation tasks.

AI Machine Learning & Data Science Popular Research

Google, Cambridge U & Alan Turing Institute Propose PolyViT: A Universal Transformer for Image, Video, and Audio Classification

A research team from Google Research, University of Cambridge and Alan Turing Institute proposes PolyViT, a single transformer model capable of processing multiple modalities and datasets. PolyViT is parameter-efficient and learns representations that generalize across multiple domains.