Large-scale pretrained language models (LMs) have shown promising results on simple code generation tasks, but they have several limitations: training models with only next-token prediction objectives leads to accumulating errors, and neglecting potentially meaningful signals from unit tests results in poor generalization capability when facing complex unseen coding tasks.
A Salesforce Research team addresses these issues in the new paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, proposing CodeRL, a novel framework for program synthesis tasks that employs pretrained LMs and deep reinforcement learning (RL) and achieves state-of-the-art performance on the challenging APPS benchmark while demonstrating impressive zero-shot transfer capabilities.
The team extends the Salesforce CodeT5 (Wang et al., 2021) unified pretrained encoder-decoder transformer architecture as CodeRL’s backbone. Although CodeT5 pretraining tasks such as masked span prediction (MSP) can benefit code understanding tasks, they do not necessarily align with program synthesis objectives. To mitigate this, a next-token prediction (NTP) pretraining task is integrated into CodeT5 to uniformly sample a pivot location for each code sample, then pass the content preceding the pivot to the encoder and the remaining content to the decoder.
The researchers formulate CodeRL’s program synthesis as an RL problem and introduce an actor-critic approach to improve model performance by utilizing the unit test signals in both the model optimization and generation processes.
The team conducted experiments on the challenging APPS (Automated Programming Progress Standard) code generation benchmark (Hendrycks et al., 2021) to evaluate the performance of the proposed CodeRL; and used the MBPP (Mostly Basic Programming Problems) benchmark (Austin et al., 2021) to evaluate its zero-shot ability.
On APPS, researchers compared their models with strong conventional baselines that included GPT-2, GPT-Neo, GPT3, Codex, and AlphaCode, where CodeRL with CodeT5 achieved new SOTA results of 2.69 percent pass@1, 6.81 percent pass@5, and 20.98 percent pass@1000.
On MBPP, CodeRL with CodeT5 obtained surprisingly good zero-shot performance, achieving a new SOTA of 63.0 percent pass@80 over GPT-137B’s 61.4 percent pass@80.
This work shows that the CodeRL method can effectively leverage unit test signals to push code generation performance to new SOTA performance and achieve strong zero-shot transfer capabilities.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.