AI Machine Learning & Data Science Research

Google & CMU Open-Source a Library for Graph Representation of Python Programs for Machine Learning Research

Google Research and Carnegie Mellon University have open-sourced a library for constructing Python program graph representations used in machine learning for code research. Details are presented in the report A Library for Representing Python Programs as Graphs for Machine Learning.

Graph representations of source code — abstract syntax tree (AST), control-flow graph (CFG), data-flow graphs, etc. — are now commonly employed by machine learning researchers for code research.

In the new paper A Library for Representing Python Programs as Graphs for Machine Learning, a team from Google Research and Carnegie Mellon University open-sources a library that applies static analysis to construct Python program graph representations for machine learning research.

The python_graph library can construct some graph types (e.g., control-flow graphs and composite program graphs) directly from arbitrary Python programs and provides tools for constructing others (e.g. interprocedural
control-flow graphs, novel composite program graphs, additional data-flow graphs and span-mapped graphs).

It can handle Python function, source code string, and abstract syntax tree input types, and convert them to abstract syntax trees for analysis.

The team summarizes the library’s four main uses cases as: 1) Control_flow, for computing control flow graphs statically from Python programs, 2) Data_flow, for computing data flow analyses of Python programs, 3) Program_graph, for computing graphs statically to represent arbitrary Python programs or functions, and 4) Cyclomatic_complexity, for computing the cyclomatic complexity of a Python function.

The python_graph library has been running successfully on millions of programs for many years. To verify its applicability with regard to the diversity of language features found in realistic code, the researchers recently applied it to a dataset of 3.3 million programs from Project CodeNet, with the results confirming its flexibility and robustness.

The library is being open-sourced to encourage its broader use and as a potential framework for future studies on graph representations of programs and how they may contribute to the machine learning research community.

The code is available on the project’s GitHub. The paper A Library for Representing Python Programs as Graphs for Machine Learning is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Google & CMU Open-Source a Library for Graph Representation of Python Programs for Machine Learning Research

  1. Pingback: Google & CMU Open-Supply a Library for Graph Illustration of Python Applications for Machine Studying Analysis - Tech Mirza

Leave a Reply

Your email address will not be published. Required fields are marked *