AI Machine Learning & Data Science Research

Stanford & CZ Biohub’s TEXTGRAD: Transforming AI Optimization with Textual Feedback

In a new paper TextGrad: Automatic 'Differentiation' via Text, a research team from Stanford University and CZ Biohub introduces TEXTGRAD, a robust framework that performs automatic differentiation through text. In this system, LLMs generate comprehensive, natural language suggestions to optimize variables in computation graphs.

AI is experiencing a transformative shift with significant advancements driven by the integration of multiple large language models (LLMs) and other complex components. Consequently, developing systematic and automated optimization methods for these compound AI systems has become a critical challenge and is essential for harnessing AI’s full potential.

In response to this need, a research team from Stanford University and Chan Zuckerberg Biohub has introduced TEXTGRAD in their new paper, “TextGrad: Automatic ‘Differentiation’ via Text.” TEXTGRAD is a robust framework that performs automatic differentiation through text. In this system, LLMs generate comprehensive, natural language suggestions to optimize variables in computation graphs, which can range from code snippets to molecular structures.

TEXTGRAD is founded on three core principles:

  1. It is a versatile and high-performance framework, not tailored to a specific application domain.
  2. It is user-friendly, mimicking PyTorch abstractions to facilitate knowledge transfer.
  3. It is fully open-source.

Within the TEXTGRAD framework, differentiation and gradients serve as metaphors for the textual feedback from LLMs. Each AI system is represented as a computation graph, where variables are the inputs and outputs of complex (and potentially non-differentiable) functions. The system provides ‘textual gradients’—informative and interpretable natural language feedback—that suggest how variables should be adjusted to enhance the system. These gradients propagate through various functions, including LLM API calls, simulators, or external numerical solvers.

The research team demonstrated TEXTGRAD’s optimization capabilities across diverse domains, including:

  1. Coding: They enhanced solutions to challenging coding problems from LeetCode, achieving a 20% performance improvement over GPT-4o and the best existing methods.
  2. Problem Solving: By refining solutions at test-time, they improved GPT-4o’s zero-shot performance on the Google-Proof Question Answering benchmark from 51% to 55%.
  3. Reasoning: They optimized prompts to elevate GPT-3.5’s performance, bringing it close to GPT-4 levels in various reasoning tasks.
  4. Chemistry: They designed new small molecules with desirable drug-like properties and in silico binding affinity to drug targets.
  5. Medicine: They optimized radiation treatment plans for prostate cancer patients to achieve targeted dosages while minimizing side effects.

Through TEXTGRAD, the team achieved state-of-the-art results in code optimization and PhD-level question answering, enhanced prompts, and provided proof-of-concept results in scientific applications such as molecule development and treatment plan optimization.

In summary, TEXTGRAD merges the reasoning capabilities of LLMs with the decomposable efficiency of backpropagation, creating a comprehensive framework for optimizing AI systems across various domains.

The code is available on project’s GitHub. The paper TextGrad: Automatic “Differentiation” via Text is on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

2 comments on “Stanford & CZ Biohub’s TEXTGRAD: Transforming AI Optimization with Textual Feedback

  1. Pingback: スタンフォード大学と CZ Biohub の TEXTGRAD: テキストフィードバックによる AI 最適化の変革 | Synced - プロンプトハブ

  2. Pingback: Exploring TextGrad: Stanford’s AI Optimization Framework – Midleton Analytics

Leave a Reply

Your email address will not be published. Required fields are marked *