Technology

Vroom! Vroom! New Dataset Rolls Out 64,000 Pictures of Cars

Recently, a dataset with 64,000 pictures of cars appeared on GitHub, the work of data scientist Nicolas Gervais.

To the machine learning community, high-quality data is as vital as the fuel to a car — it’s what keeps the ML engines running. Recently, a dataset with 64,000 pictures of cars appeared on GitHub, the work of data scientist Nicolas Gervais. The Car Connection Picture Dataset is of added interest because its images are conveniently labeled by make, model, year, price, horsepower, body style and more.

Gervais first collected more than a quarter million images from the website thecarconnection.com. His focus was on exteriors, and excluding car interior and other images left him with the 64k set, with picture sizes of about 320×210. Users can also access large versions of the images by adjusting the included scraper settings in “scrape.py.”

image.png
image.png

To demonstrate the dataset’s potential in practical applications, Gervais created a car price prediction model, and an Audi vs BMW deep learning classification task in PyTorch.

image.png
image.png

So, what is the first thing the ML community thought of with these 64,000 pictures of cars in hand? Making fantasy rides of course: “Seems like this would be really fun to hook up to StyleGAN2 and be able to generate cars based on those properties” suggested Reddit user Skylion007 in a sentiment echoed by others on the ML discussion reddit. StyleGAN is the hyperrealistic image generator developed by chip giant NVIDIA in 2018. Philip Wang used the tool to create “This Person Does Not Exist,” a website that generates a new hyperrealistic fake human face every time it’s refreshed. The tech has since extended to cats, airbnbs, anime faces — why not cars?

image.png
Reddit exchange on the new car dataset’s potential for building dream cars with GANs.

Aside from amusing vehicle style mashups, it’s also been suggested the dataset could be used to predict future car designs, or style and price trends, etc.

Gervais is a Python software engineer with TD in Montreal and a Machine Learning and Data Science student at McGill. The Car Connection Picture Dataset is available on his GitHub.


Journalist: Fangyu Cai | Editor: Michael Sarazen

3 comments on “Vroom! Vroom! New Dataset Rolls Out 64,000 Pictures of Cars

  1. Pingback: Vroom! Vroom! New Dataset Rolls Out 64,000 Pictures of Cars – Rlogger

  2. Your blog is very helpful for beginners who want to learn Data science. I am also a Data science developer. I had done Data Science course from TGC India. They offer a variety of tutorials covering everything from the processes of Data Science to how to get started with Data Science.

  3. Thanks

Leave a Reply

Your email address will not be published.

%d bloggers like this: