In recent years, AI systems have achieved or surpassed human-level performance on games such as Go and StarCraft, on prediction tasks in medical imaging, and even in optimizing microchip architectures. Could coding be next?
The idea of automating coding is not new, and today’s powerful large-scale language models have already demonstrated their code-generation potential for simple tasks such as webpage design. These models however struggle with more complex, unseen problems that require problem-solving skills that go beyond simply translating instructions into code.
In the new paper Competition-Level Code Generation with AlphaCode, a DeepMind research team introduces a system that uses transformer-based language models to generate code and create novel solutions for programming problems that require deep reasoning. Tested in competitions with more than 5,000 human participants, the AlphaCode ranked in the top 54.3 percent.
The DeepMind researchers identify three critical components that enabled their model to reach human-competitive performance in code-generation tasks: 1) an extensive and clean competitive programming dataset for training and evaluation; 2) large and efficient-to-sample transformer-based architectures; and 3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.
The proposed AlphaCode model is first pretrained on a collection of open-source code from GitHub to enable it to learn good code representations and generate code fluently. To help the model adapt to the target competitive programming domain, it is subsequently fine-tuned on CodeContests, a competitive programming dataset comprising programming problems compiled from a variety of sources.
In the evaluation step, AlphaCode generates sample C++ and Python programs for each problem in a quantity orders of magnitude larger than previous works. These samples are filtered using the example tests and clustering to obtain a small set of candidate submissions (at most 10) based on program behaviour, and these are then evaluated on hidden test cases. This automated system can thus effectively save programmers from the time- and energy-consuming trial-and-error processes of debugging, compiling, passing tests, and eventually submitting their code.
In their empirical study, the team evaluated AlphaCode on the Codeforces platform and on CodeContests, and compared it with published models on the public APPS (Hendrycks et al., 2021) benchmark of programming problems.
The results show that the proposed AlphaCode performs roughly at the level of an average human participant in a coding competition. The researchers further note that AlphaCode does not simply copy important parts of previous solutions or exploit weaknesses in the problem structure; and that it is able to solve unseen problems that require a combination of critical thinking, logic, algorithms, coding, and natural language understanding.
DeepMind has released its CodeContests dataset of competitive programming problems and solutions on GitHub. The paper Competition-Level Code Generation with AlphaCode is on Google Cloud Storage.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.