The success of contemporary machine learning algorithms has been largely driven by increasing model and dataset sizes. The distribution of computation across multiple devices is becoming the pervasive approach for scaling the training of these complex models with large-scale data.
Distributing the learning process however also complicates the implementation process, which can be problematic for the many machine learning practitioners unfamiliar with distributed system mechanisms, especially those with complicated communication topologies.
In a new paper, a research team from DeepMind and Google Brain addresses this issue with Launchpad, a programming model that simplifies the process of defining and launching instances of distributed computation.
Launchpad describes the topology of a distributed system as a graph data structure such that each node in the graph represents a service, i.e. the fundamental units of computation that the researchers are running. A handle is constructed as a reference to the node, representing a client to the yet-to-be-constructed service. The edge of the graph represents communication between two services, and is created when the handle associated with one node is given to another at construction time. In this way, Launchpad can define cross-service communication simply by passing around node handles. Launchpad’s computational building blocks are represented by different service types, and each service type is represented by node and handle classes specific to that type.
The lifecycle of the proposed Launchpad can be divided into three phases: setup, launch, and execution. The setup phase constructs the program data structure; in the launch phase, this data structure is processed to allocate resources, addresses, etc. and initiate the launch of the specified services; and the execution phase then runs the services, for example creating clients for services communication.
Launchpad is implemented in the popular programming language Python, which simplifies the process of defining the program and node data structures and launching for individual platforms. The Launchpad framework can also be easily implemented in any other host language, including low-level programming languages such as C/C++.
The Launchpad programming model is rich enough to accommodate a wide variety of distributed systems, including Parameter servers (Ahmed et al., 2012; Li et al., 2014), MapReduce (Dean and Ghemawat, 2008) and Evolution Strategies. The researchers provide detailed descriptions for applying Launchpad to these common distributed system paradigms with concise codes, illustrating the framework’s capability to simplify the design process of common machine learning algorithms and components in this research area.
Overall, Launchpad is a practical, user-friendly and expressive framework for specifying distributed systems by machine learning researchers and practitioners that its creators believe can provide valuable assistance in designing today’s increasingly complicated distributed learning algorithms.
The paper Launchpad: A Programming Model for Distributed Machine Learning Research is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.