One of the significant challenges for deploying machine learning (ML) systems in the wild is distribution shifts — changes and mismatches in data distributions between training and test times. To address this, researchers from Stanford University, University of California-Berkeley, Cornell University, California Institute of Technology, and Microsoft, in a recent paper, present “WILDS,” an ambitious benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications.
Although distribution shifts naturally occur over time in the real world, most of the datasets currently in use by the ML community have training and test sets drawn from the same distribution, and were not designed for dealing with distribution shifts. The researchers note that previous studies that used artificial shifts to retrofit distribution shifts did not always reflect the shifts encountered in the real world.
WILDS builds on top of recent data collection efforts by domain experts in applications such as tumour identification, wildlife monitoring and poverty mapping, presenting a unified collection of datasets with evaluation metrics and train/test splits that the researchers believe are representative of real-world distribution shifts.
The researchers focus on two types of distribution shift: domain generalization of related but distinct data, e.g. patient records from different hospitals; and subpopulation shift, where the test distribution is a subpopulation of the training distribution, e.g. minority groups.
There are currently seven datasets in the WILDS collection, reflecting distribution shifts arising from different demographics, users, hospitals, camera locations, countries, time periods, molecular scaffolds, etc., all of which could cause substantial performance drops in baseline models if inaccurately represented.
Most of the WILDS datasets have been substantially modified to make them more user-friendly and consistent, the researchers say. By including datasets from a variety of application areas and making them accessible to the ML community through careful preprocessing, standardized evaluations, and shared infrastructure, the researchers hope to encourage the development of general-purpose methods that are anchored to real-world distribution shifts, and that can work well across different applications and problem settings.
The researchers also discuss each dataset’s broader context and relation to other tasks and distribution shifts in the same application area, and identify other application areas — algorithmic fairness and policing, medicine and healthcare, natural language and speech processing, code, education, and robotics — as promising sources for future additions to the benchmark.
The paper Wilds: A Benchmark of in-the-Wild Distribution Shifts is on arXiv. The WILDS Python package and additional information are available on the Stanford University website. There is also a project GitHub.
Reporter: Yuan Yuan | Editor: Michael Sarazen
Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Pingback: [R] WILDS: Benchmarking Distribution Shifts in 7 Societally-Important Datasets – ONEO AI
Pingback: [R] WILDS: Benchmarking Distribution Shifts in 7 Societally-Important Datasets – tensor.io