Deep learning’s recent impressive breakthroughs have largely been based on huge quantities of manually annotated training data. Because appropriate large labelled datasets are often simply not available and manual labelling takes a lot of time, deep learning researchers are always on the lookout for ways to achieve the desired results with smaller amounts of data.
Moreover, solving this data challenge can enable AI practitioners with limited resources to perform faster and cheaper model customization.
Few-shot classification is a crucial component in the struggle to teach models using limited information. Although this is a hot research field, previous benchmarks could not reliably evaluate different models, which has hindered research progress.
In a paper published at ICLR 2020 this month, Google AI researchers introduce Meta-Dataset, a large-scale and diverse benchmark for measuring the ability of few-shot classification models. The team also proposes a new set of baselines to quantify the benefit of meta-learning in Meta-Dataset with ten publicly available datasets including ImageNet, CUB-200-2011, Describable Textures, Quick Draw, Fungi, VGG Flower, Traffic Signs, and MSCOCO.
The researchers evaluated pretraining and meta-learning models on the Meta-Dataset and found that existing methods perform poorly with heterogeneous training data sources. The team compared model training using only ImageNet training classes and using all Meta-Dataset training classes. After training on all datasets (including those visually different from ImageNet), performance on Omniglot and Quick Draw tasks significantly improved. In tasks on natural image datasets, similar high accuracy can be obtained by training on ImageNet only.
The researchers also examined the performance of different models based on the number of examples available in each test task. They found that some models outperform others when there are few given examples, but do not show much improvement after being given more examples. On the other hand, some models didn’t work well with only a few examples, but as more are provided their performance quickly improves.
The researchers hope to promote future ML research through further exploration of meta-datasets. Although the team uncovered interesting potential directions for meta-learning across heterogeneous data, they stress that “it remains unclear what is the best strategy for creating training episodes, the most appropriate validation creation and the most appropriate initialization.”
The project code is open-sourced and further information available on the GitHub project page. The paper Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples is on arXiv.
Author: Herin Zhao | Editor: Michael Sarazen