Facebook announced today that it is open-sourcing Pythia, a deep learning framework for vision and language multimodal research framework that enables researchers to “more easily build, reproduce, and benchmark AI models.”
Pythia is built on PyTorch and was designed for Visual Question Answering (VQA) research — for example to answer questions related to visual data and to automatically generate image captions. Facebook lists the following Pythia features:
- Model Zoo: Reference implementations for state-of-the-art vision and language model including LoRRA (SoTA on VQA and TextVQA), Pythia model (VQA 2018 challenge winner) and BAN.
- Multi-Tasking: Support for multi-tasking which allows training on multiple datasets together.
- Datasets: Includes support for various datasets built-in including VQA, VizWiz, TextVQA and VisualDialog.
- Modules: Provides implementations for many commonly used layers in the vision and language domain
- Distributed: Support for distributed training based on DataParallel as well as DistributedDataParallel.
- Unopinionated: Unopinionated about the dataset and model implementations built on top of it.
- Customization: Custom losses, metrics, scheduling, optimizers, tensorboard; suits users’ custom needs.
Facebook believes the framework will enable vision and language researchers to produce faster prototyping studies and experimentation: “This work should also help researchers develop adaptive AI that synthesizes multiple kinds of understanding into a more context-based, multimodal understanding.”
Pythia’s open-sourcing reflects Facebook’s strategy of inviting researchers to stand on its shoulders to advance AI. Pythia can for example be used as a starter codebase for vision and language dataset tasks such as the VQA challenge.
Facebook says it will soon be adding associated Pythia tools, tasks, datasets, and reference models; and hopes the easy-to-use framework will inspire and accelerate the AI community’s research innovations.
The open-sourced Pythia framework is available on GitHub.
Journalist: Fangyu Cai | Editor: Michael Sarazen
0 comments on “Facebook Open-Sources Pythia for Vision and Language Multimodal AI Models”