In a recent paper, “CodePlan: Repository-level Coding using LLMs and Planning,” a team from Microsoft Research introduces CodePlan—a versatile framework designed to address the complexities of repository-level coding tasks, encompassing extensive code changes across large, interconnected codebases.
In a new paper AskIt: Unified Programming Interface for Programming with Large Language Models, a MIT CSAIL research team presents AskIt, a domain-specific language (DSL) tailored for LLMs to accommodate a wide variety of tasks, which substantially reducing practitioners’ developmental overhead and effort for software.
In the new paper CodeT5+: Open Code Large Language Models for Code Understanding and Generation, a Salesforce AI Research team presents CodeT5+, a novel family of encoder-decoder code foundation large language models that can be flexibly adapted to a wide range of code understanding and generation tasks and outperform various code-related benchmarks.
In the new paper StarCoder: May the Source Be With You!, the BigCode community releases StarCoder and StarCoderBase, 15.5B parameter open-access large language models (LLMs) trained on 80+ programming languages. StarCoderBase outperforms all multi-programming-language code LLMs, and StarCoder surpasses all models fine-tuned on Python.
In the new paper DocPrompting: Generating Code by Retrieving the Docs, a research team from Carnegie Mellon University and Inspired Cognition presents DocPrompting, a natural-language-to-code generation approach. Tasked with generating code to unseen functions or libraries from a natural language intent, DocPrompting retrieves corresponding code documentation to enable the model to learn to perform the task.
In the new paper The Stack: 3 TB of Permissively Licensed Source Code, a team from ServiceNow Research and Hugging Face advances open and responsible research on code LLMs by releasing The Stack, a 3.1 TB dataset of permissively licensed source code in 30 programming languages.
In the new paper Interactive Code Generation via Test-Driven User-Intent Formalization, a team from Microsoft Research, the University of Pennsylvania, and the University of California, San Diego proposes a workflow for test-driven user-intent formalization that leverages user feedback to generate code that is 90.40 percent consistent with user intent.
In the new paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, a Salesforce Research team presents CodeRL, a novel framework for program synthesis tasks that employs pretrained language models (LMs) and deep reinforcement learning (RL) and achieves state-of-the-art performance on the challenging APPS benchmark while also demonstrating impressive zero-shot transfer capabilities.