When it comes to the modern development of artificial intelligence, people always focus on neural networks, machine learning etc. Most of the work done in both industry and academia is about performance modeling and optimization of the learning algorithm and network architecture, which is basically from a learner’s perspective. Recently, Microsoft Research group proposed to accomplish what machine learning systems try to achieve, but from a teacher’s perspective: a universally accessible, easy to manage, and fast discipline which they called Machine Teaching (MT). This paper laid out the position of machine teaching discipline and articulated the fundamental machine teaching principles.
The supply-and-demand relationship between machine learning (ML) and “machine teachers” is very off-balanced. The specific and targeted machine learning models with high accuracy are specialized to different fields, so an increasing number of machine teachers implements the process and makes the teaching machines easy, fast, and universally accessible. While most researchers focus on improving the accuracy or creating new algorithms, the machine teaching method is focused on the performance of the teachers. Traditional machine learning does not address the potential problems which could affect productivity and performance. Concepts/targets could be decomposed into sub-concepts which the teacher could easily access and reversibly manipulate, where labeling for sub-concepts only benefits the teacher rather than the machine learning algorithm.
Machine teaching, directly connected to machine learning fundamentals, emphasizes the teacher’s actions, interactions with data, and design principles of interaction and visualization, therefore, the goal of machine teaching is to interconnect the HCI and related systems.
2. Why Do We Need Machine Teaching Discipline
To build a machine learning model, the necessary steps are:
- A problem owner collects data, writes labeling guidelines, and optionally contributes some labels
- The problem owner outsources the task of labeling a large portion of the data (e.g., 50,000 examples)
- The problem owner examines the labels and may discover that the guidelines are incorrect or that the sampled examples are inappropriate or inadequate for the problem. When that happens, go back to step 1
- An ML expert is consulted to select the algorithm (e.g., deep neural network), the architecture (e.g., number of layers, units per layer, etc.), the objective function, the regularizers, the cross-validation sets, etc.
- Engineers adjust existing features or create new features to improve performance. Models are trained and deployed on a fraction of traffic for testing
- If the system does not perform well on test traffic, go back to step 1
- The model is deployed on full traffic. Performance of the model is monitored, and if that performance goes below a critical level, the model is modified by returning to step 1
The iteration of steps (1) to (6) takes weeks, whereas the system can be stable at the last step for months. The problems in machine learning can be disastrous and the algorithm may break for a variety of reasons. Since different experts with different background are involved, it takes tremendous effort and coordination to figure out why the model does not perform as expected after being retrained. Therefore, building a machine learning model does not only involve collecting data and applying algorithms, but also making the management process of building machine learning solutions, which can be fraught with inefficiencies.
2.1 Definitions of Machine Learning and Machine Teaching
Definition 2.1 (Machine learning research). “Machine Learning research aims at making the learner better by improving ML algorithms.”
Definition 2.2 (Machine teaching research). “Machine teaching research aims at making the teacher more productive at building machine learning models.”
2.2 Decoupling Machine Teaching from Machine Learning
Machine teaching solutions may need one or more machine learning algorithms, which is regarded as “compilers” of conversion to build models throughout the teaching process. This discipline is to reduce the maintenance time and required expertise. The goal is “write once, compile anywhere”. The authors also imposed additional system requirements for machine teaching.
3. Analogy to Programming
3.1 Commonalities and Differences between Programming and Teaching
- The target function needs to be specified
- The target function can be decomposed into sub-functions
- Functions (including sub-functions) need to be tested and debugged
- Functions can be documented
- Functions can be shared
- Functions can be deployed
- Functions need to be maintained (scheduled and unscheduled debug cycles)
Differences are summarized in Table 1:
3.2 Programming Paving the Way Forward
The following lessons are related to machine teaching before constructing the machine teaching discipline.
The goal of machine teaching is to create a function by decomposing the task into pieces of task/subtasks, until the subtasks are straightforward enough to be solved.
The keys discussed here to develop the scaling to multiple contributors are programming languages, interfaces (APIs), and version control. The use of componentization and interfaces allow for a separation of concerns that reduces development complexity, and for independent development and innovation.
Table 2 shows a mapping of the related tools and concepts of machine teaching.
Much of the machine teaching effort is undertaken by experts in machine learning and statistics. The discipline of machine teaching is young and in its formative stages. The growth of this discipline will continue at an even quicker pace and might be the path to bringing machine learning to the masses.
Definition 4.1 (Concept). “A concept is a mapping from any example to a label value.”
Definition 4.2 (Feature). “A feature is a concept that assigns each example a scalar value.”
Definition 4.3 (Teacher). “A teacher is the person who transfers concept knowledge to a learning machine.”
To clarify this definition of a teacher, the methods of knowledge transfer are defined as a combination of example selection (biased), labeling, schema definition (the relationship between labels), featuring, and concept decomposition (where features are recursively defined as sub-models).
Figure 1 shows the connections among concepts, labels, features, and teachers are related.
Definition 4.4 (Selection). “Selection is the process by which teachers gain access to an example that exemplifies useful aspects of a concept.”
Definition 4.5 (Label). “A label is an (example, concept value) pair created by a teacher in relation to a concept.”
Definition 4.6 (Schema). “A schema is a relationship graph between concepts.”
Definition 4.7 (Generic feature). “A generic feature is a set of related feature functions.”
Definition 4.8 (Decomposition). “Decomposition is the act of using simpler concepts to express more complex ones.”
All the definitions above define the key roles of a machine teacher. The rest of this section answers how we meet the demand for them.
One way is to train more scientists, but expensive resources. In addition, machine learning scientists and data scientists are better in inventing and optimizing learning algorithms to apply their expertise.
5. Teaching Phase
The following principles for the language and process of machine teaching are proposed in this paper:
- Universal teaching language.
- Feature completeness
- Rich and diverse sampling set
- Distribution robustness
- Modular development
- Version control
All these principles suggest a teaching process different from the standard teaching process.
Based on the principles explained above, a skeleton for a teaching process is proposed in Algorithm 1.
The field of machine learning has spent most of its effort to develop and improve learning algorithms. If the given data are plentiful, this approach can play an important role. When the demand to solve such kinds of problems increases, the access to teachers who are responsible to build corresponding solutions will be constrained. To meet this demand, the authors suggest advancing the discipline of machine teaching.
7. Technical Comments
Machine learning algorithms are invented and implemented on a daily basis, or possibly on an hourly basis. While most scientists and companies focus on developing learning algorithms are optimizing the learning efficiency. This proposed Machine Teaching by Microsoft Research group caught some attention from a different perspective. Admittedly, this machine teaching algorithm is not mature and need more effort to improve its principles and corresponding disciplines. However, this could be the chance where the fundamental framework of machine learning can be reconstructed so that it is easy, fast, and accessible to “learners”.
Although this paper explains and defines what machine teaching is, it needs more groundwork to lay out the proof with experimental evidence. Theoretical definitions/explanations only support that machine teaching could be more efficacious than a single machine learning algorithm. The research group needs to or will demonstrate the concept of machine teaching they proposed is next focus point in machine learning field.
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). ACM, New York, NY, USA, 41–48. https://doi.org/10.1145/1553374.1553380 Frederick P Brooks Jr. 1995.
The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, 2/E. Pearson Education India. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka, Jr., and Tom M. Mitchell. 2010.
Toward an Architecture for Never-ending Language Learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10). AAAI Press, 1306–1313. http://dl.acm.org/citation.cfm?id=2898607.2898816
Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075–3084.
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, and Michael Young. 2014.
Machine Learning: The High Interest Credit Card of Technical Debt. In SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop). Vladimir Vapnik. 2013. The Nature of Statistical Learning Theory. Springer science & business media.
Paper Authors : Patrice Y. Simard, Saleema Amershi, David M. Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and John Wernsing, MICROSOFT RESEARCH
Author: Bin Liu | Editor: Zhen Gao | Localized by Synced Global Team: Xiang Chen