When researchers create a new machine learning (ML) model, they seek to avoid the overfitting problem, where the model only performs well on data distributions similar to those in its training set. Instead, the modern research focus is increasingly on building models that are robust to distribution shifts and can generalize effectively to new data and scenarios. This is especially important in applications such as self-driving vehicles and medical imaging, where models’ failure to adapt to distribution shifts can not only undermine confidence in the systems but also pose a danger to users.
There has however been only limited research on when and why models generalize or on evaluating different algorithms’ robustness across different distribution shifts. To bridge this gap, in the new paper A Fine-Grained Analysis on Distribution Shift, a DeepMind research team presents a framework for the fine-grained analysis of various distribution shifts and provides insights on when and why we can expect models to successfully generalize.
The team summarizes their main contributions as:
- We propose a framework to define when and why we expect methods to generalize. We use this framework to define three real-world inspired distribution shifts. We then use this framework to create a systematic evaluation setup across real and synthetic datasets for different distribution shifts. Our evaluation framework is easily extendable to new distribution shifts, datasets, or methods to be evaluated.
- We evaluate and compare 19 different methods (training more than 85K models) in these settings. These methods span the following five common approaches: architecture choice, data augmentation, domain generalization, adaptive algorithms, and representation learning. This allows for a direct comparison across different areas in machine learning.
- We find that simple techniques, such as data augmentation and pretraining are often effective and that domain generalization algorithms do work for certain datasets and distribution shifts. However, there is no easy way to select the best approach a priori, and results are inconsistent over different datasets and attributes, demonstrating there is still much work to be done to improve robustness in real-world settings.
The paper starts with the question“Can we define the important distribution shifts to be robust to and then systematically evaluate the robustness of different methods?”Taking inspiration from disentanglement literature that separates images into an independent set of factors of variation (or attributes), the researchers posit that models that have seen some distribution of values for a given attribute may be able to learn invariance to that attribute, enabling them to generalize to unseen examples of and different distributions over that attribute. They identify three approaches for maintaining model performance when the data generating distribution changes: weighted resampling, data augmentation and representation learning.
The team evaluates models with regard to three common real-world distribution shifts they consider building blocks of more complex distribution shifts: 1) Spurious correlation, which arises in capturing bias, environmental factors, and geographical bias; 2) Low data drift, when the training is not captured uniformly across different attributes; and 3) Unseen data shift, when a model rained in one specific setting is expected to work in another, disjointed setting. They also evaluate 19 algorithms and provide a detailed analysis on how they can achieve robustness to the abovementioned three distribution shifts.
The experiments show that while simple techniques such as data augmentation and pretraining are effective for domain generalization, it is difficult or even impossible to decide a priori on the optimal method given only a dataset. The team suggests pinpointing the precise distribution shift in a given application (thus identifying methods to explore) as an important future research avenue. They also observe that the robustness results are inconsistent over different datasets and attributes, and so propose focusing on cases where there is knowledge about the distribution shift and employing adaptive algorithms that can use auxiliary information if available. Finally, they stress the importance of evaluating methods in a variety of conditions, as performance varies depending on the number of examples, amount of noise, and size of the dataset.
The team hopes their general, comprehensive framework for reasoning about distribution shifts can help ML practitioners evaluate which methods work best under which conditions and shifts, and encourage additional research in this area.
The paper A Fine-Grained Analysis on Distribution Shift is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.