Facebook, Georgia Tech & OSU ‘ViLBERT’ Achieves SOTA on Vision-and-Language Tasks

A team of researchers from the Georgia Institute of Technology, Facebook AI Research and Oregon State University has proposed ViLBERT (Vision-and-Language BERT), a novel model for visual grounding that can learn joint representations of image content and natural language.