AI Research

Sotabench: Benchmarking Open Source Models Directly From GitHub

Machine learning research resource Papers with Code last week introduced Sotabench, a free and open website created to benchmark and rate the performance of state-of-the-art open source models from GitHub.

Machine learning research resource Papers with Code last week introduced Sotabench, a free and open website created to benchmark and rate the performance of state-of-the-art open source models from GitHub. Papers with Code is an open platform that aggregates ML papers, code, and evaluation tables and metrics from sources such as arXiv and GitHub.

The team behind Sotabench has already implemented a few benchmarks on the site and is encouraging anyone and everyone to contribute additional benchmarks and connect their repositories. Contributors will have access to free GPUs to run their code on public benchmarks, and the results can be compared to the paper results for reproducibility.

So far there are eight benchmarks on the Sotabench website:

  • ImageNet (Image Classification)
  • COCO Minival (Object Detection)
  • WMT2014 English-German (Machine Translation)
  • WMT2019 English-German (Machine Translation)
  • WikiText-103 (Language Modelling)
  • SQuAD1.1 dev (Question Answering)
  • WMT2014 English-French (Machine Translation)
  • SQuAD2.0 dev (Question Answering)

Each benchmark page includes a Leaderboard summarizing and ranking existing models; a list of models on Papers with Code that have not yet been tested; and instructions on how to contribute models.

For example, the ImageNet benchmark page shows a graph with a ratio of top-1 accuracy and images per second for all models tested and a list of model details including repository, top-5 accuracy, speed, paper and most importantly, whether the results reach the reported accuracy.

The leaderboard will be a convenient place for practitioners and researchers to compare the tradeoff between speed and accuracy of the many models out there.

As model speed goes up, Top-1 Accuracy and reproducibility rates tend to drop

Sotabench’s automatization and standardization of dataset acquisition, pre-processing and evaluation will help newcomers get started in ML and enable reviewers to easily compare models in terms of different metrics.

There are of course always doubters. Because Sotabench is a free and open platform where anyone can contribute, some Reddit users are already questioning its credibility and pointing to the potential risk of data cheating. One suggested establishing a report and review system to prevent this from happening. It would however be unwise to cheat on such a public platform as the deceit would be easy to spot. The majority of reactions to Sotabench on Reddit and elsewhere have been positive, with many expressing their willingness to contribute to the centralized benchmark project.

Papers with Code is a free resource website supported by Atlas ML, a startup founded by researchers and engineers from the University of Cambridge. The new Sotabench website is here.


Author: Reina Qi Wan | Editor: Michael Sarazen

2 comments on “Sotabench: Benchmarking Open Source Models Directly From GitHub

  1. Pingback: No, you cannot do proper studies with only data from #Microsoft bec… | Dr. Roy Schestowitz (罗伊)

  2. Pingback: Links 20/10/2019: GNU/Linux at Penn Manor School District, Wine-Staging 4.18, Xfce 4.16 Development, FreeBSD 12.1 RC2 | Techrights

Leave a Reply

Your email address will not be published. Required fields are marked *

%d