The past decade has seen a burst of algorithms and applications in machine learning especially deep learning. Behind the burst of these deep learning algorithms and applications are a wide variety of deep learning tools and frameworks. They are the scaffolding of the machine learning revolution: the widespread adoption of deep learning frameworks like TensorFlow and PyTorch enabled many ML practitioners to more easily assemble models using well-suited domain-specific languages and a rich collection of building blocks.
Looking back at the evolution of deep learning frameworks we can clearly see a tightly coupled relationship between deep learning frameworks and deep learning algorithms. These virtuous cycle of interdependency propels a rapid development of deep learning frameworks and tools into the future.
Stone Age (early 2000s)
The concept of neural networks have been around for a while. Before the early 2000s, there were a handful of tools that can be used to describe and develop neural networks. These tools include MATLAB, OpenNN, and Torch etc. They are either not tailored specifically for neural network model development or having complex user APIs and lack of GPU support. During this time, ML practitioners had to do a lot of heavy lifting when using these primitive deep learning frameworks.
Bronze Age (~2012)
In 2012, Alex Krizhevsky et al. from the University of Toronto proposed a deep neural network architecture later known as AlexNet  that achieved the state-of-the-art accuracy on ImageNet dataset and outperformed the second-place contestant by a large margin. This outstanding result sparked the excitement in deep neural networks and since then various deep neural network models kept setting higher and higher record in the accuracy of ImageNet dataset.
Around this time, some early days deep learning frameworks such as Caffe, Chainer and Theano came into being. Using these frameworks, users could conveniently built complex deep neural network models such as CNN, RNN, and LSTM etc. In addition, multi-GPU training was supported in these frameworks which significantly reduced the time to train these models and enabled training large models that were not able to fit into a single GPU memory earlier. Among these frameworks, Caffe and Theano used a declarative programming style while Chainer adopted the imperative programming style. These two distinct programming styles also set two different development paths for the deep learning frameworks that were yet to come.
Iron Age (2015~2016)
As the success of AlexNet drew great attention in the area of computer vision and reignited the hope of neural networks, large tech companies joined the force of developing deep learning frameworks. Among them, Google open sourced the famous TensorFlow framework that is still the most popular deep learning framework in ML field up to date. The inventor of Caffe joined Facebook and continued the release of Caffe2; at the same time, Facebook AI Research (FAIR) team also released another popular framework PyTorch which was based on the Torch framework but with the more popular Python APIs. Microsoft Research developed the CNTK framework. And Amazon adopted MXNet, a joint academic project from University of Washington, CMU and others. TensorFlow and CNTK borrowed the declarative programming style from Theano whereas PyTorch inherited the intuitive and user-friendly imperative programming style from Torch. While imperative programming style is more flexible (such as defining a while loop etc.) and easy to trace, declarative programming style often provides more room for memory and runtime optimization based on compute graph. On the other hand, MXNet, dubbed as “mix”-net, enjoyed the benefits of both worlds by supporting both a set of symbolic (declarative) APIs and a set of imperative APIs at the same time and optimized the performance of models described using imperative APIs via a method called hybridization.
In 2015 ResNet  was proposed by Kaiming He et al. and again pushed the boundary of image classification by setting another record in ImageNet accuracy. A consensus has been reached in both industry and academia that deep learning was going to the next big technology trend to solve challenges in various fields that were not deemed possible before. During this period, all deep learning frameworks were polished to provide clear-defined user APIs, optimized for multi-GPU training and distributed training and spawned many model zoos and toolkits that were targeted to specific tasks such as computer vision, natural language processing etc. It is also worth noting that François Chollet almost single-handedly developed the Keras framework that provides a more intuitive high-level abstraction of neural networks and building blocks on top of existing frameworks such as TensorFlow and MXNet. This abstraction became the de facto model level APIs in TensorFlow as of today.
Roman Times (2019~2020)
Just like how the human history unfolded, after a round of fierce competitions among deep learning frameworks, came to the duopoly of two big “empires”: TensorFlow and PyTorch, which represented more than 95% of the use cases of deep learning framework in research and production. Chainer team transitioned their development effort to PyTorch in 2019; similarly, Microsoft stopped active development of the CNTK framework and part of the team moved to support PyTorch on Windows and ONNX runtime. Keras was assimilated by TensorFlow and became one of its high-level APIs in the TensorFlow 2.0 release. MXNet remained a distant third in the deep learning framework space.
There are two trends in the deep learning framework space during this period. First is large model training. With the birth of BERT  and its Transformer-based relatives such as GPT-3 , ability to train large models became a desired feature of deep learning frameworks. This requires the deep learning framework to be able to train efficiently at a scale up to hundreds if not thousands of devices. Second trend is usability. All the deep learning frameworks during this period adopted the imperative programming style for its flexible semantics and easy debugging. At the same time, these frameworks also provide user-level decorators or APIs to achieve high performance through some JIT (just-in-time) compiler techniques.
Industrial Age (2021+)
The huge success of deep learning in a wide range of fields from self-driving, personalized recommendation, natural language understanding to health care etc. brought in an unprecedented wave of users, developers and investors. The coming decade is the golden time for developing deep learning tools and frameworks. Although deep learning frameworks have improved significantly from their inception, they are still far from mature as the programming language JAVA/C++ to the development of Internet applications. A lot of exciting opportunities and works are yet to be explored and accomplished.
Looking forward, there are a few technical trends that are promising to become mainstream in the next generation of deep learning frameworks:
- Compiler-based operator optimization. Today a lot of operator kernels are implemented either manually or via some third party libraries such as BLAS, CuDNN, OneDNN etc. that are targeted to a specific hardware platform. This caused a lot of overhead when model is trained or deployed on different hardware platforms. In addition, the growth of new deep learning algorithms is often much faster than the iteration of these libraries making new operators often not supported by these libraries. Deep learning compilers such as Apache TVM, MLIR, Facebook Glow, etc. have been proposed to optimize and run computations efficiently on any hardware backend. They are well positioned to serve as the entire backend stack in the deep learning frameworks.
- Unified API standards. Many deep learning frameworks share similar but slightly different user APIs. This caused difficulty and unnecessary learning curve for users to switch from one framework to another. While the majority of machine learning practitioners and data scientists are familiar with NumPy library, it became natural that NumPy API should be the standard for tenor manipulation APIs in the new deep learning frameworks. We are already seeing a warm reception from users in the rapidly growing framework JAX whose APIs are purely NumPy compatible.
- Data movement as a first-class citizen. Multi-node or multi-device training is becoming the norm for deep neural network training. Recently developed deep learning framework, such as OneFlow, took this insight into their design consideration from day one and treat data communication as part of the overall computation graph of the model training. This opens doors to more opportunities for performance optimization and since it does not have to maintain multiple training strategies (single device vs distributed training) as the previous deep learning frameworks do, it can provide a simpler user interface in addition to better performance.
We are at the dawn of an AI revolution. New research and applications in AI are generated at an unprecedented pace. Eight years ago, AlexNet network contains 60 million parameters; the most recent GPT-3 network contains 175 billion parameters, a 3000X increase in network size in 8 years! Human brains on the other hand contain an estimated 100 trillion parameters (aka synapses). This indicates that there is still a large gap for the neural network to reach human-level intelligence, if ever possible.
This prohibitive network size poses a great challenge for efficient computation in both hardware and software for model training and inference. The future deep learning framework is likely to be an interdisciplinary outcome of algorithms, high performance compute, hardware accelerators and distributed systems.
 Alex Krizhevsky et al., ImageNet Classification with Deep Convolutional Neural Networks (2012), NeurIPS 2012
 Kaiming He et al., Deep Residual Learning for Image Recognition (2016), CVPR 2016
 Jacob Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
 Tom B. Brown et al., Language Models are Few-Shot Learners (2020), NeurIPS 2020
About Lin Yuan
Lin Yuan is a Staff Software Engineer at Waymo. He develops the machine learning platform for the perception and planning tasks for autonomous driving vehicles. Before joining Waymo, he was working at Amazon AI on large scale distributed learning. He is a committer and major contributor to the Apache deep learning framework MXNet and LFAI distributed learning library Horovod.
Before working in the AI doman, he had abundant experience in VLSI design and automation. He served as session chair in Design Automation Conference and Technical Program Committee on ICCAD conference. He received his Ph.D. in Computer Engineering from the University of Maryland, College Park.
Views expressed in this article do not represent the opinion of Synced Review or its editors.
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.