Google & UC Berkeley’s ‘Self-Debugging’ Framework Teaches LLMs to Debug Their Own Code

Large language models (LLMs) continue to demonstrate impressive capabilities across a wide range of complex tasks, even proving adept at generating computer code. However, as Brian Kerrigan noted in The Elements of Programming Style, “Debugging is twice as hard as writing the code in the first place… if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

LLMs are no exception. Despite their ability to provide feedback and refine outputs in many natural language processing tasks, these models struggle with checking and correcting computer code without access to external feedback resources such as unit tests or human instructions.

In the new paper Teaching Large Language Models to Self-Debug, a Google Research and UC Berkeley team presents Self-Debugging, a framework for teaching LLMs to debug their own predicted code via few-shot demonstrations. The novel approach improves baseline accuracy by up to 12 percent.

The team’s approach first leverages few-shot prompting to enable the LLM to tackle tasks based on only several input-output demonstrations. Instructions can be optionally added in the demonstration prompt to provide a higher-level task description. An execution-based code selection approach is then used to select the predicted code with the most frequent execution results among correct execution codes, and Self-Debugging is then applied to the code.

The Self-Debugging framework comprises an iterative debugging process. Given a problem description, Self-Debugging first predicts candidate programs, then infers program correctness and generates feedback for subsequent debugging steps. This process continues until one of two termination conditions is met: the feedback shows that the prediction is correct, or the maximum allowed number of debugging turns is reached.

In their empirical study, the team applied Self-Debugging to several code generation domains, where it improved on the baselines by 2-3 percent on text-to-SQL generation tasks and bettered baseline accuracy by up to 12 percent on code translation and text-to-Python generation tasks.

This paper demonstrates the proposed Self-Debugging framework’s ability to teach LLMs to identify, understand and correct code errors in a manner the team likens to “rubber duck debugging,” i.e. by explaining the process line-by-line, as if to a rubber duck, rather than generating correct code from scratch. The team plans to explore additional techniques to improve the model’s performance at all steps and enable the prediction of more informative error messages.

The paper Teaching Large Language Models to Self-Debug is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Google & UC Berkeley’s ‘Self-Debugging’ Framework Teaches LLMs to Debug Their Own Code”

Brain Philipenko

2023-05-31

Going green doesn’t mean compromising on quality or functionality. If you are a café owner looking to contribute to the environment while ensuring your customers enjoy their hot beverages, the Safepro 16 oz biodegradable Kraft ripple hot cups are an excellent choice. Found at https://mcdonaldpaper.com/safepro-eco-sb42-16-oz-double-wall-biodegradable-kraft-ripple-hot-cups-500-cs/ , these double-wall cups keep drinks hot for a long time and offer a unique ripple design. With 500 cups per case, managing stock during peak business hours is never an issue. The cups are a testament that eco-friendly and functionality can coexist seamlessly.

Loading...

Google & UC Berkeley’s ‘Self-Debugging’ Framework Teaches LLMs to Debug Their Own Code

Like this:

1 comment on “Google & UC Berkeley’s ‘Self-Debugging’ Framework Teaches LLMs to Debug Their Own Code”

Leave a Reply Cancel reply

Related

Share this:

Like this:

1 comment on “Google & UC Berkeley’s ‘Self-Debugging’ Framework Teaches LLMs to Debug Their Own Code”

Leave a Reply Cancel reply

Related