In the new paper TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, a Microsoft research team proposes TaskMatrix.AI, a novel ecosystem that connects foundation models with millions of existing models and system APIs to build a “super-AI” capable of addressing a wide range of digital and physical tasks.
In the new paper ClimaX: A Foundation Model for Weather and Climate, a team from Microsoft Autonomous Systems and Robotics Research, Microsoft Research AI4Science and the University of California at Los Angeles presents ClimaX, a foundation model for weather and climate that can be efficiently adapted for general-purpose tasks related to the Earth’s atmosphere.
In the new paper Foundation Transformers, a Microsoft team proposes a method for true general-purpose modelling. Their Foundation Transformer is a single unified transformer that provides guaranteed training stability and can handle diverse tasks and modalities without performance degradation.
In the new paper Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, a Microsoft research team presents BEiT-3, a general-purpose state-of-the-art multimodal foundation model for both vision and vision-language tasks that advances the big convergence of backbone architectures, pretraining tasks, and model scaling.
A research team from Sun Yat-sen University and UBTECH proposes a unified approach for justifying, analyzing, and improving foundation models in the new paper Big Learning: A Universal Machine Learning Paradigm? The team’s big learning framework can model many-to-all joint/conditional/marginal data distributions and delivers extraordinary data and task flexibilities.
A Facebook AI Research team presents FLAVA, a foundational language and vision alignment model that explicitly targets language, vision, and their multimodal combination all at once, achieving impressive performance on 35 tasks across the vision, language, and multimodal domains.
In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.