Humans’ development and use of complex language and tools are two fundamental ways we differ from other animals. The recent emergence of foundational AI models trained on massive amounts of unlabelled data and capable of generating humanlike text outputs for various tasks has some speculating they may be the path to artificial general intelligence (AGI). However, despite their game-changing performance, foundation models can still struggle with domain-specific tasks such as mathematical calculations.
Maybe these foundation models simply need the right tools to take the next evolutionary leap?
In the new paper TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, a Microsoft research team proposes TaskMatrix.AI, a novel ecosystem that connects foundation models with millions of existing models and system APIs to build a “super-AI” capable of addressing a wide range of digital and physical tasks.
While many existing symbolic- or neural-based AI models and systems are able to efficiently address domain-specific tasks, their different implementations and working mechanisms and various compatibility issues can make them difficult for foundation models to access. This work aims to solve that.
The team summarizes the key advantages of TaskMatrix.AI as follows:
- TaskMatrix.AI can perform both digital and physical tasks by using the foundation model as a core system to understand different types of inputs (such as text, image, video, audio, and code) first and then generate codes that can call APIs for task completion.
- TaskMatrix.AI has an API platform as a repository of various task experts. All the APIs on this platform have a consistent documentation format that makes them easy for the foundation model to use and for developers to add new ones.
- TaskMatrix.AI has a powerful lifelong learning ability, as it can expand its skills to deal with new tasks by adding new APIs with specific functions to the API platform.
- TaskMatrix.AI has better interpretability for its responses, as both the task-solving logic (i.e., action codes) and the outcomes of the APIs are understandable.
TaskMatrix.AI comprises four key components: 1) A Multimodal Conversational Foundation Model (MCFM) is used to communicate with users, understand their goals and multimodal context, and generate API-based executable codes for performing specific tasks; 2) An API Platform provides a unified API documentation schema to store millions of APIs and enable developers to register, update and delete their APIs; 3) An API Selector recommends related APIs based on the MCFM’s understanding of users’ goals; and 4) An API Executor executes the generated action codes from the relevant APIs and return the results.
The team also applies reinforcement learning with human feedback (RLHF) techniques to train a reward model and optimize TaskMatrix.AI with knowledge and insights gained from humans. This approach assists the MCFM and API selector in finding the optimal policy, speeds up convergence and results in better performance on complex tasks.
In their empirical study, the team applied TaskMatrix.AI on the task of automatically generating PowerPoint slides for different companies, using ChatGPT as the MCFM. In the experiment, TaskMatrix.AI broke the task down into about 25 API calls to successfully generate multiple slides for different companies. TaskMatrix.AI also exhibited an understanding of user instructions based on PowerPoint content that enabled it, for example, to generate pages based on a company list and insert an appropriate logo based on the title of each page.
Overall, this work demonstrates TaskMatrix.AI’s ability to improve performance on diversified tasks by connecting foundation models to various existing APIs. The team believes that — together with the continued development of foundation models, cloud services, robotics, and the Internet of things — TaskMatrix.AI has the potential to help build “an amazing future world, where productivity and creativity can reach new levels.”
The paper TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.