Detecting performance drift in machine learning (ML) models is a crucial yet challenging task. The problem is typically caused by data drift, and as data labels are often difficult or expensive to obtain, it would be advantageous to discover an effective way to identify such degradation in ML operations without labels.
In the new paper Machine Learning Model Drift Detection Via Weak Data Slices, an IBM Research team proposes a novel feature-space drift detection method based on feature space rules, called “data slices,” and provides empirical validation of its effectiveness.
The team summarizes their main contributions as:
- Provide a feature space method for drift detection, including the definition of a data slices-based distribution for drift detection.
- Provide initial evidence for the effectiveness of our drift detection method and characterize the types of data drift it is effective in detecting.
A weak data slice is a data slice for which the misclassification rate (MCR) is significantly higher than the overall MCR of the ML model on the test dataset. In this work, the researchers propose that weak data slices can be used to define an empirical distribution for drift detection.
The weak slices considered here represent regions of the feature space over which the ML model is likely to have errors. Intuitively, the larger the slices are on a given dataset, the more errors the ML model is likely to produce. The proposed method aims to detect data distribution changes that are likely to affect ML model results and identify likely performance degradation with no labels based on these data distribution changes. These goals are achieved via statistical testing — a commonly used hypothesis test for comparing two datasets for drift detection.
The researchers use weak data slices to nonparametrically compare two datasets (D1 and D2) of the same feature set. To detect data distribution changes, they set the hypothesis that the data distributions of the two datasets are the same. The rejection of this hypothesis indicates there is some change in the data features’ distribution in problematic areas identified by the slices. Similarly, to identify likely performance degradation, the hypothesis is that the D2 data distribution is strictly larger than D1. If this is accepted, it indicates that D2’s mistakes are growing relative to D1; while if the hypothesis is rejected, D2’s mistakes are shrinking.
To evaluate their approach, the team conducted experiments on three datasets: Adult, Anuran and MP.
The results show that the proposed drift detection method is effective in detecting either distributional changes or increases in misclassifications. The method also effectively identifies such drifts even when the underlying univariate feature distribution is unchanged.
In future research, the team plans to investigate the relationship between drift magnitude and risk resulting from ML model degradation, as well as the use of effect size rather than p-values in their tests.
The paper Machine Learning Model Drift Detection Via Weak Data Slices is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.