Synced invited Dr. Linchao Zhu, a lecturer at the ReLER lab, University of Technology Sydney whose works focus on video representation learning, to share his thoughts on the paper Text-to-Image Generation Grounded by Fine-Grained User Attention.
Google Brain’s Switch Transformer language model packs a whopping 1.6 trillion parameters while effectively controlling computational cost. The model achieved a 4x pretraining speedup over a strongly tuned T5-XXL baseline.
Recent AI research on speech separation has explored ways to associate lip motions in videos with audio, but this approach suffers when speakers’ lips are occluded, which they often are in busy multi-speaker environments.
VOGUE, an AI-powered optimization method that deforms garments according to a given body shape while preserving pattern and material details to deliver state-of-the-art photorealistic, high-resolution try-on images.
In the new paper Canonical Capsules: Unsupervised Capsules in Canonical Pose, Turing Award Honoree Dr. Geoffrey Hinton and a team of researchers propose an architecture for unsupervised learning with 3D point clouds based on capsules.
At AWS re:Invent, Amazon Web Services, Inc., an Amazon.com company, announced Amazon Monitron, Amazon Lookout for Equipment, the AWS Panorama Appliance, the AWS Panorama SDK, and Amazon Lookout for Vision.
OpenAI’s groundbreaking GPT-3 language model paper, a no-regret learning dynamics study from Politecnico di Milano & Carnegie Mellon University, and a UC Berkeley work on data summarization have been named the NeurIPS 2020 Best Paper Award winners.
Researchers from the City University of Hong Kong and SenseTime propose a lightweight matting objective decomposition network (MODNet) that can smoothly process real-time human matting from a single input image with diverse and dynamic backgrounds.