Site icon Synced

ArXiv’s 1.7M+ Research Papers Now Available on Kaggle

To help make world’s largest free scientific paper repository even more accessible, arXiv announced yesterday that all of its research papers are now available on Kaggle.

Launched in 1991 by Paul Ginsparg as a preprint physics archive, arXiv is hosted by Cornell University and has become an indispensable platform providing free and open access to research for the computer science and machine learning communities and beyond. Its new collaboration with Kaggle, the world’s largest data science community, provides a free and open pipeline to the machine-readable arXiv dataset of some 1.7 million articles.

The Kaggle dataset mirrors the original arXiv paper data, with each entry including:

“By offering the dataset on Kaggle we go beyond what humans can learn by reading all these articles and we make the data and information behind arXiv available to the public in a machine-readable format,” said arXiv Executive Director Eleonora Presani in a press release.

Kaggle is a regular destination for data scientists and machine learning engineers seeking interesting datasets, public notebooks, information on competitions and so on. Researchers can utilize Kaggle’s extensive data exploration tools to share relevant scripts and outputs with others.

As a burgeoning knowledge-sharing platform, arXiv benefits from constant innovation regarding information presentation and interpretation, and Presani believes additional input from Kaggle’s massive user base can help push the limits of this innovation.

It’s hoped the arXiv and Kaggle collaboration will empower new use cases and lead to the exploration of richer machine learning techniques that combine multi-modal features in applications such as trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction, semantic search interfaces and more.

The arXiv dataset is now available on Kaggle and will be updated weekly.


Reporter: Yuan Yuan | Editor: Michael Sarazen


We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Exit mobile version