Scientific computing — loosely defined as the use of computational models to solve science and engineering problems — has benefited greatly in recent years from the rapid development of machine learning technologies in artificial intelligence. Now, researchers are exploring ways to bridge the two worlds.
A research team from Julia Computing and the Massachusetts Institute of Technology propose that extensive scientific computing and machine learning domains both require linear algebra support on their underlying structures. The team has introduced a novel computational infrastructure in the form of differentiable programming (∂P), which can calculate model gradients and integrates automatic differentiation into the language as a first-class feature. Programmers can write the model directly in Julia programming language and microprogramming.
The system supports almost all programming languages (Python, R, Julia, etc.) and constructs and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables programmers to build deep learning models using existing Julia scientific computing packages and efficiently implement gradient calculations.
Differentiable programming is a programming paradigm wherein programs can be entirely differentiated. Based on a provided dataset, the neural network automatically learns the mapping from the input data X to the final operation result Y (both the entire program); or in combination with the high-level code provided by the programmer, the neural network is used as an intermediate function to complete the entire program.
Facebook Chief AI Scientist Yann LeCun commented on the use of differential programming: “People are now building a new kind of software by assembling networks of parameterized functional blocks and by training them from examples using some form of gradient-based optimization.”
As the basis of a common shared infrastructure for both machine learning and scientific computing disciplines, this differentiable programming system allows for new applications that unite the domains, by using the same technology to differentiate programs in both domains. Researchers show the same performance as existing ML frameworks for deep learning models (on CPUs, GPUs, and TPUs) and in reinforcement learning. Differentiable programming can also be extended to other scientific computing domains, for example neural SDEs and quantum machine learning.
Y Combinator researcher Michael Nielsen tweeted that he hopes the new paper “is part of a trend exploring more and more in this direction” and echoed a remark from Tesla AI Director Andrej Karpathy: “gradient decline is a better programmer.” Karpathy replied to Nielson’s tweet: “We’re moving up the stack a bit; Instead of writing explicit fully defined program we write a rough sketch “tube” of a program (tube parameterized by some \theta), and then if you have an evaluatable metric the best point in the tube gets selected via optimization.”
The paper ∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing is on arXiv.
Author: Yuqing Li | Editor: Michael Sarazen & Tony Peng