Revolutionizing Autonomous Agents: Salesforce’s xLAM Outperforms GPT-4

Synced

2 years ago

Autonomous agents powered by large language models (LLMs) have garnered considerable research attention. However, the open-source community faces significant hurdles in developing specialized models for agent tasks, primarily due to the limited availability of high-quality datasets and the lack of standardized protocols in this field.

In a new paper xLAM: A Family of Large Action Models to Empower AI Agent Systems, a Salesforce AI Research team
presents the xLAM series, a collection of large action models designed to enhance the performance of open-source LLMs for autonomous AI agents. This work aims to accelerate innovation in the field and make high-performance models for agent tasks more accessible.

The xLAM models are designed for various applications, with smaller models (1B and 7B) optimized for on-device deployments and larger models (8x7B and 8x22B) geared towards more complex tasks. The training pipeline for xLAM encompasses several key stages, including data unification, augmentation, quality verification, general instruction data synthesis, and preference data generation.

A standout feature of the xLAM pipeline is its data unification process, which standardizes data using several modules: task instructions, available tools, format guidelines, few-shot examples, queries, and steps. This unified format ensures compatibility across a wide range of environments and tasks, allowing the pipeline to scale and adapt to different datasets. The modular structure also facilitates precise data augmentation and thorough quality verification, which are essential for improving the quality of agent data.

The data augmentation strategy focuses on increasing dataset diversity by applying various transformations, generating synthetic data to enrich the training pool. The team employed two key augmentation techniques: prompt format augmentation and instruction-following augmentation. Prompt format augmentation involves creating different prompt structures based on the unified data format, while instruction-following augmentation enhances the model’s ability to follow diverse instructions, boosting its overall capability.

The researchers also introduce multiple agent models tailored to specific use cases. The flagship xLAM series is built on the Mixtral Instruct models, aiming to deliver balanced performance across a wide array of agent tasks, from complex multi-turn dialogues to function-calling applications. In addition to the general-purpose xLAM models, the team developed two specialized models for function-calling tasks, xLAM-7B-fc-r and xLAM-1B-fc-r, built on DeepSeek-Coder-7B-instruct-v1.5 and DeepSeekCoder-1.3B-instruct, respectively.

Experimental evaluations show that xLAM consistently achieves top-tier results across various benchmarks for agent capabilities. Notably, it secured the top position on the Berkeley Function-Calling Leaderboard, outperforming leading models like GPT-4 and Claude-3 in tool usage tasks.

The code is available on project’s GitHub. The paper xLAM: A Family of Large Action Models to Empower AI Agent Systems is on arXiv.

Author: Hecate He | Editor: Chain Zhang

Share this: