code generation

by Synced 2024-05-13 6

IBM’s Granite Code: Powering Enterprise Software Development with AI Precision

An IBM research team introduces the Granite Code model family. Specifically optimized for enterprise software development workflows, these models excel across a spectrum of coding tasks, rendering them versatile and well-suited for diverse coding challenges.

by Synced 2023-12-29 2

AI Machine Learning & Data Science Research

Precision Coding Redefined: Microsoft WaveCoder’s Pioneering Approach to Fine-Tuned LLM Model Performance

In a new paper WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation, a Microsoft research team introduces CodeOcean, which harnesses source code to explicitly control data quality, significantly improving the generalization ability of fine-tuned LLM models.

by Synced 2023-09-29 2

AI Machine Learning & Data Science Research

Microsoft’s CodePlan: Unleashing the Power of Language Models for Repository-Level Coding Tasks

In a recent paper, “CodePlan: Repository-level Coding using LLMs and Planning,” a team from Microsoft Research introduces CodePlan—a versatile framework designed to address the complexities of repository-level coding tasks, encompassing extensive code changes across large, interconnected codebases.

by Synced 2023-09-05 5

AI Machine Learning & Data Science Research

MIT’s AskIt Provides A Unified Programming Interface for Code Generation with LLMs

In a new paper AskIt: Unified Programming Interface for Programming with Large Language Models, a MIT CSAIL research team presents AskIt, a domain-specific language (DSL) tailored for LLMs to accommodate a wide variety of tasks, which substantially reducing practitioners’ developmental overhead and effort for software.

by Synced 2023-08-29 7

AI Machine Learning & Data Science Nature Language Tech Research

Meta AI Open Sources Code Llama: A SOTA Code-Specialized Llama 2

In a new paper Code Llama: Open Foundation Models for Code, a Meta AI research team releases Code Llama, a family of code-specialized Llama 2 models for code generation and infilling, which achieves state-of-the-art performance against open models on code benchmarks.

by Synced 2023-06-28 3

AI Machine Learning & Data Science Research

Microsoft’s Crafted “Textbook Quality” Data Are All You Need to Train 10× Smaller Yet Strong Language Model for Code

In a new paper Textbooks Are All You Need, a Microsoft’s research team crafts ‘textbook quality’ data for training large language model for code, the resulting phi-1 model improves the state-of-the-art large language models (LLMs) with mere 1.3B-parameter.

by Synced 2023-05-17 1

AI Machine Learning & Data Science Research

Salesforce AI’s CodeT5+ Open Code LLMs Flexibly Adapt to Diverse Downstream Code Understanding and Generation Tasks

In the new paper CodeT5+: Open Code Large Language Models for Code Understanding and Generation, a Salesforce AI Research team presents CodeT5+, a novel family of encoder-decoder code foundation large language models that can be flexibly adapted to a wide range of code understanding and generation tasks and outperform various code-related benchmarks.

by Synced 2023-05-16 2

AI Machine Learning & Data Science Nature Language Tech Research

‘May the Source Be With You!’ – BigCode’s Open-Access StarCoder Outperforms All Existing Open Code LLMs

In the new paper StarCoder: May the Source Be With You!, the BigCode community releases StarCoder and StarCoderBase, 15.5B parameter open-access large language models (LLMs) trained on 80+ programming languages. StarCoderBase outperforms all multi-programming-language code LLMs, and StarCoder surpasses all models fine-tuned on Python.

by Synced 2023-02-28 34

AI Machine Learning & Data Science Nature Language Tech Research

CMU & Inspired Cognition’s DocPrompting Improves Code Generation by Retrieving Relevant Documentation

In the new paper DocPrompting: Generating Code by Retrieving the Docs, a research team from Carnegie Mellon University and Inspired Cognition presents DocPrompting, a natural-language-to-code generation approach. Tasked with generating code to unseen functions or libraries from a natural language intent, DocPrompting retrieves corresponding code documentation to enable the model to learn to perform the task.

by Synced 2022-12-12 5

AI Machine Learning & Data Science Nature Language Tech Research

ServiceNow Research & Hugging Face Release The Stack: 3 TB of Permissively Licensed Source Code for LLMs

In the new paper The Stack: 3 TB of Permissively Licensed Source Code, a team from ServiceNow Research and Hugging Face advances open and responsible research on code LLMs by releasing The Stack, a 3.1 TB dataset of permissively licensed source code in 30 programming languages.

by Synced 2022-08-18 12

AI Machine Learning & Data Science Research

Microsoft, Penn U & UC San Diego’s TiCoder Framework Generates Code With 90.4% Consistency to User Intent

In the new paper Interactive Code Generation via Test-Driven User-Intent Formalization, a team from Microsoft Research, the University of Pennsylvania, and the University of California, San Diego proposes a workflow for test-driven user-intent formalization that leverages user feedback to generate code that is 90.40 percent consistent with user intent.

by Synced 2022-07-07 1

AI Machine Learning & Data Science Research

Salesforce’s CodeRL Achieves SOTA Code Generation Results With Strong Zero-Shot Transfer Capabilities

In the new paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, a Salesforce Research team presents CodeRL, a novel framework for program synthesis tasks that employs pretrained language models (LMs) and deep reinforcement learning (RL) and achieves state-of-the-art performance on the challenging APPS benchmark while also demonstrating impressive zero-shot transfer capabilities.

by Synced 2022-02-04 2

AI Machine Learning & Data Science Research

DeepMind’s AlphaCode Generates Code at a Level Competitive With Human Programmers

A DeepMind research team presents AlphaCode, an automated code-generation system that can create novel solutions for programming problems that require deep reasoning and achieves a top 54.3% ranking in programming competitions.

by Synced 2021-07-13 2

AI Machine Learning & Data Science Nature Language Tech Research

OpenAI Fine-Tunes GPT-3 to Unlock Its Code Generation Potential for Difficult Problems

A research team from OpenAI proposes Codex, a specialized GPT model fine-tuned on publicly available code from GitHub that can produce functionally correct Python code bodies from natural language docstrings and could excel at a variety of coding tasks.