Outperforming Giants: TinyAgent’s Edge-Based Solution Surpasses GPT-4-Turbo

Synced

2 years ago

Recent advancements in large language models (LLMs) have enabled the creation of sophisticated agentic systems that utilize tools and APIs to answer user queries through function calling. However, deploying these models on edge devices remains largely unexplored due to their significant size and high computational requirements, which generally necessitate cloud-based infrastructure.

In a new paper TinyAgent: Function Calling at the Edge, a research team from UC Berkeley and ICSI introduce TinyAgent, a comprehensive framework designed to train and deploy small, task-specific language models capable of performing function calls for agentic systems at the edge. Remarkably, TinyAgent outperforms larger models such as GPT-4-Turbo in this specific function-calling ability.

The research highlights that smaller models, when trained on specialized and high-quality datasets, can effectively perform complex tasks without relying on extensive world knowledge. The primary objective of this work is to develop Small Language Models (SLMs) that can be securely and privately deployed on edge devices, while still possessing the reasoning skills needed to comprehend natural language inputs and coordinate tools and APIs to complete user requests.

To achieve this, the team first focuses on enabling small open-source models to execute precise function calls, a critical element for agentic systems. They also emphasize the importance of curating tailored datasets specifically for function calling, using a Mac assistant agent as a case study. The researchers then enhance the performance of these models by incorporating a novel approach called ToolRAG, along with quantization techniques, to improve inference efficiency and ensure real-time responses in edge deployments.

In essence, the success of TinyAgent hinges on four key components: (i) leveraging the LLMCompiler framework to train off-the-shelf SLMs for function calling, (ii) creating high-quality datasets tailored to specific tasks, (iii) fine-tuning these models using the curated data, and (iv) optimizing deployment through ToolRAG to reduce prompt size by selecting only the necessary tools based on user input, combined with quantized models to minimize resource usage during inference.

Empirical results show that TinyAgent models achieved success rates of 80.06% and 84.95% for the 1.1B and 7B models, respectively, surpassing GPT-4-Turbo’s success rate of 79.08% on the same task. These findings demonstrate that TinyAgent can not only rival but exceed the function-calling capabilities of larger models, all while being deployable at the edge.

The paper TinyAgent: Function Calling at the Edge is on arXiv.

Author: Hecate He | Editor: Chain Zhang

Share this: