Last June, the popular GitHub code repository and developer community platform released GitHub Copilot, a code auto-completion tool powered by OpenAI’s Codex model that FastAPI creator Sebastián Ramírez described as “seriously mind-blowing, difficult to overhype” and “the future of coding.”
CoPilot and other neural code completion systems leverage language models to provide helpful code snippet suggestions for developers based on contextual information in the integrated development environment (IDE). While such systems have proven their ability to predict appropriate codes automatically, there is no standard method or metric for evaluating productivity with regard to different code completion systems.
In the new paper Productivity Assessment of Neural Code Completion, a GitHub research team explores whether usage measurements of developer interactions with GitHub Copilot can be used to predict productivity as reported by developers. The team surveyed thousands of Copilot users and compared their responses to usage measurements collected on the IDE based on code contribution and acceptance rates.
To help them evaluate how well performance on programming competition data generalizes to interactive development in an IDE, the researchers define acceptance rate as the fraction of the suggested code completions that are subsequently accepted by the developer and integrated into the final source code. This enables them to look for correlations between the usage measurements of developer-Copilot interactions and productivity as identified in the developer survey.
The researchers surveyed 17,420 Copilot users with questions regarding demographic information and Likert-style questions about productivity aspects. The survey defines productivity in five dimensions: satisfaction and well-being, performance, activity, communication and collaboration, and efficiency and flow.
Based on their study, the team concludes that the rate with which shown suggestions are accepted captures developers’ perception of productivity better than more specific metrics regarding the persistence of completions in the code over time. They also find that the acceptance rate most positively predicts users’ perception of productivity, although they note that given the confounding and human factors, there is still notable unexplained variance. The results support the premise that acceptance rate can be used for coarse-grained monitoring of the performance of a neural code synthesis system.
The team believes theirs is the first study on code suggestion tools to identify a clear link between usage measurements and developer productivity or happiness. Overall, the work confirms neural code completion systems’ ability to boost developer productivity and provides useful insights for a more concrete evaluation of the productivity gains they offer.
The paper Productivity Assessment of Neural Code Completion is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.